Thursday, December 10, 2009

Customer Churn Analytics

I've had a great opportunity recently to work on a customer churn predictive analytics project. The goal is to predict which customers are likely to churn in the near future.

For tools I'm using all SQL Server 2008 applications; ssms, ssas, ssis, ssrs. What a great toolbox. After spending a little time learning about SAP, SPSS, and Statistica, I can honestly say that MS has a great stack for those interested in using predictive analytics to drive business decisions resulting in a very high ROI.

I'm not able to share any details on the project, but I will share the high level status. We are able to identify around 45% of the customers that will be considered lost in the next 2 months. This comes at a cost of a very manageable false positive rate.

Below is the lift chart showing how the model is performing. Training data consisted of a 2 year period ending in Jun 2008. This chart shows how well the model predicted lost customers for a 2 year period ending in June 2009. The business being analyzed is seasonal which led us to a monthly segmentation. The model has been verified in several ways including cross validation, lift analysis, classification, and decile performance.

There are plenty of ideas on the table for improving the model's accuracy, but the strides taken thus far have a clear business value which I'm hoping to be able to report on during upcoming posts. Model improvements could go on for quite some time in the future. On the top of my list include more attribute grooming, decision tree bagging, ensembles, over sampling, and exploring a few other algorithms in more detail.