In recent months we’ve been seeing a rise in the interest in predictive analytics from our clients, so we’re running a series of to explain what predictive analytics is, how it’s used and to explore good practice. This posting is the sixth in the series.
The last posting was more technical than the previous ones and in it we outlined an industry-standard approach to predictive analytics, CRISP-DM. This time we’re finishing off that line of thought by highlighting what we see as a particular weakness and how we at Red Olive overcome it. In case you’re
finding it hard going, next time we’ll be moving back to something a bit less involved!
The main problem: live models stop working
The main problem we see with the standard approach to predictive analytics is the lack of feedback to improve a predictive model once it is live. This is a big problem because the models degrade over time.
For example, the accuracy of models predicting customer response to an offer is affected by market changes such as competitor activity, new product development and broader economic conditions. The result is that the characteristics of the new data, which is being captured for scoring by the model, diverge from those of the original data that was used to develop the model. So how do you measure results when the goalposts keep moving?
The response profile or gains chart (see example) for the model on the new data should reflect that seen on the original model development data. If
it doesn’t then you should consider a model refresh (where the same attributes are kept but the model weights updated) or a total model rebuild.
(The diagram to the left is an example of a gains chart. Along the bottom it shows the % of a population touched (e.g. through direct marketing) and up the left it shows the fraction of the people predicted with a desired outcome (e.g. who respond and buy a product). The red line shows a random response and the blue line shows the response validated by a data set. If a model is valid against several different sets of data, the shape of the blue line should remain very similar in each case. The blue area shows the gain the model provides over random)
The methodology root causes: Internet-paced market change and a new approach
This isn’t new, so why would an industry standard approach not allow for it? Well, we think there are two considerations, both related to when CRISP-DM was developed. Firstly, the Internet was in its infancy and so the ability and demand to interact with customers in such a quickly changing and personalised way was much less than it is now, so the demand for rapid changes to models was much lower. Secondly, because of limitations in the tools available at the time, many of the steps in the cycle were expected to be performed by hand, so the supply of scarce skills was too limited to support quicker changes. Neither of these situations is true today.
Internet-paced market change
We have said quite a lot about business applications of predictive analytics in previous postings, but as a general summary there is growing demand for quicker cycling through this process and the need to increase the frequency of model optimisation. Business drivers include more competition in the marketing environment driving the need for better customer targeting and personalisation, and the drive for real-time analytics to enable things such as ‘next best offer’ on internet sites. It’s no good ignoring this issue and targeting the wrong customers!
A new approach: Maturing from cottage industry to automated factory
If you are faced with rapidly increasing demand for accurate models, what can you do? We think predictive analytics is ripe to mature from a cottage industry to an ‘automated factory’ approach. This is akin to the move many organisations have made away from hand-crafted SQL code for data integration and it offers similar benefits: much quicker optimisation of models and a resultant increase in the ongoing accuracy of customer targeting, a decrease in running costs and the ability to manage and share work across distributed teams, reducing business risk caused by reliance on too few people.
One software company we’ve been working with that we think understands this change well is KXEN. They don’t try to compete across the board
with bigger established companies like SAS and SPSS, instead they focus their efforts. The KXEN product automatically checks the effectiveness of each live
model before it is deployed onto new data. It looks for ‘deviation’: it provides a comparison of the original and new data sets and a report that indicates how well the model will perform on the new data, where any differences in the characteristics of the data lie and so on. This gives some insight into what may have changed in the market. It also offers an updated gains curve that shows the original model performance and the performance on the new data set, enabling an analyst to judge whether to deploy, refresh or rebuild the model.
KXEN has also recently released a product it calls ‘Modelling Factory’ which can be used to automate management of models. Loaded models are checked for deviation as described above before being scheduled for deployment. Refreshes and rebuilds can be scheduled to take place automatically when
needed or after approval has been granted by someone notified via e-mail.
One final reason why we at Red Olive like this approach so much, is that it begins to make advanced analytics much more accessible throughout a whole organisation and not just the preserve of statisticians. The modelling process is only a means to an end, and that end is for a business to deliver the right service to the right customer at the right time. Advances in analytics such as those described help organisations become more agile. We will be looking at agility in more depth in a later posting, so check back next week.