BiVal: October 2009

PAW was a great conference. Nearly every session/workshop I attended was full of either “how-to’s” or very relevant case studies facts. Below I’ll summarize my impression of the sessions attended.

Day 1: SAS Hands on workshop (Dean Abbott, President Abbott Analytics)

On day one I attended a full day workshop. The goal of the workshop was to get hands on experience building models. There was also an emphasis on using SAS’s Enterprise Miner. I have never worked with SAS before so it was great to play around with the tool and see what has made SAS the hands down leader in modeling software.

The workshop started with an overview of the data mining process. We talked about the CRISP modeling process in depth. A SAS representative attended the session and gave us all a brief introduction to SAS’s enterprise miner. Then we got started digging through the data.

Most of our time was spent transforming data. SAS has some very nice wizards and built in components that make data sampling, transformations, bootstrapping, etc… seamless to the end user. After getting the data prepared we reviewed descriptive statistics and began making modeling decisions. In the end we built several models using different algorithms. I chose to use neural networks and decision tree. In the end the ensemble of decision trees was the most accurate model.

Key Takeaways:
- SAS and most other modeling software pulls data out of the warehouse, stores it on a client machine or server, and then begins doing transformations & modeling. This results in additional management overhead. This began my thoughts on the advantages of in-database modeling such that SQL Server & Oracle offer.
- SAS only considers case level data. SQL Server offers the ability to look at nested data for each case. For example; consider a model where the case level data is a customer. We may want to evaluate customer transactions (nested). To do this in SAS we will need to summarize the transaction level data up to the case level data. In SQL Server we can simply included the nested transactions in the model structure.

Day 2 & 3: Lecture Sessions (Mostly case studies)

Keynote: Five Ways to Lower Costs with Predictive Analytics (Eric Siege, Ph.D.)

Great introduction to the series of case studies to follow. Eric Siegel was the conference Chair and a very nice guy. The keynote touched on many topics briefly and gave a good overview of data mining in general. An emphasis was placed on uplift modeling and the various ways that predictive analytics can add value.

Case Study: National Rifle Association; How to Improve Customer Acquisition models with Ensembles

There were several sessions and workshops that emphasized the power of ensembles. Dean Abbot (also taught Monday’s workshop) started the trend with this session. The concept is counter intuitive. Basically, what is happening is that we are running a model on a subset of randomly selected cases repeatedly and then simply averaging the results. In many cases the ensemble score is higher than any of the individual model scores and nearly always provides a better result long term. Great presentation.

Multiple Case Studies: Anheuser-Busch, Disney, HP, HSBC, Pfizer, and others; The High ROI of Data Mining for Innovative Organizations (John Elder, Ph.D.)

John Elder is a great mixture of academics and business skills. In this session he listed a series of data mining situations and concluded with the results. Key point – Not all of the data mining examples were considered successful. Reasons for failure were typically business process or politics related.

Keynote: Predictive Analytics over On-line and Social Network Data (Usama Fayyad, Ph.D.)

There was an emphasis on using social network data as a behavioral or attitudinal input to mining models. The concept is very interesting. Marketing departments realize that friends and family members have much more influence that advertising. Finding a way to leverage social networking is one way to impact behavior. In this presentation Usama (former CIO of Yahoo) talked about how yahoo presents ads to users. Yahoo is tracking individual search request to identify trends. They also are evaluating the longevity sensitivity. How long is a search request relevant to the user? Very interesting stuff. Some of this again raises privacy issues that have yet to be hashed out.

Case Study: Target Challenges of Incremental Sales Modeling in Direct Marketing (Andrew Pole)

Target has done a lot of modeling around customer uplift. Andrew is focused on customer level uplift. The key problem here is how to determine if marketing has provided uplift for an individual. Typically models look at groups of people or profiles and measure results at an aggregate level. At the individual level, it is much more difficult to quantify results because we don’t have a hold out set. In other words, we can’t test the result of a mailer because a single customer can’t receive and not receive a mailer. Kind of a narrowly focused issue, but interesting.

Case Study: Optus (Austrailian Telcom) Know your Customers by Knowing Who They Know and Who They Don’t (Tim Manns)

Tim was a great speaker. He is working for Optus who is very aggressively trying to use relationships to improve customer retention as well as drive customer growth. In short, they are looking at who customers are calling and how often they are calling to create relationship networks. When the networks are created they can be used in marketing efforts to impact customer retention. For example, when a customer churns, then related customers are at an increased risk of churning. Marketing can take action to prevent customer loss.

KDDCup 2009 Competition Results: Orange Labs (France Telcom)

Again, an emphasis on Decision Tree ensembles was presented. Below is a graph that was used to show the uplift that the model could provide.

Case Study: Citizens Bank; Building In-Database predictive Scoring Model: Check Fraud Detection (Jay Zhou, Business Data Miners)

Dr Zhou was not the best speaker, but his presentation was great. He was focused on in-database modeling which I am also very interested in. Why pull the data out, transform, model, and then try to find a way to get all of this back in the database? There was not a lot of in-database talk, and neither Oracle nor Microsoft was at the conference. Additionally, mining languages were not discussed in much depth at all. I asked several people if they have any exposure to DMX and not one person even know the language existed. I’m not sure if that is a good or bad thing or if it is just a result of being at a conference where SAS and SPSS were the main sponsors.

Keynote: Opportunities and Pitfalls: What the World Does and Doesn’t Want from Predictive Analytics (Stephen Baker, Business Week Author)

Again, the presentation is not available for this session which is too bad because Stephen Baker did a great job of presenting some of the issues with PA. Stephen is an author of Business week as well as the author of The Numerati. He talked quite a bit about the risk associated with PA as well as the creepiness factor that being too accurate at predictions causes. I have not read his book yet, but I did order it from Amazon this week.

Case Study: The Financial Times, The New York Times, Sprint-Nextel – Predicting Future Subscriber Levels (Michael Barry, Data Miners, Inc)

Michael Barry presented a subscriber demand level forecasted method that should be a little more accurate than traditional methods. He calls the method Hazard Probability. Basically, he is looking at the likelihood that a customer is going to churn at particular points in the future. This likelihood is then applied to existing and new customers. Fairly simple method, but clearly provides more insight into future demand as well as how marketing efforts impact future demands.

Case Study: Coke – A Predictive Approach to Marketing Mix Modeling (Ram Krishnamurthy, Coke)

Coke has spent a lot of time and effort on marketing channel optimization. Is it better to spend money on TV advertising, Print, Radio, or Billboards? There are many variables at play and with Coke’s many brands this issue becomes quite complex. The presentation does a good job of presenting the issue as well as the way Coke has tackled it.

Case Study: Lifeline Screening- Segmented Modeling Application in Health Care Industry (Ozgur Dogan, Merkle)

Merkle’s Ozgur Dogan presented a case study based primarily on segmentation analysis. Instead of looking at all cases in a single analysis prospects are segmented based on some logical grouping. Uplift for the individual segments can be much higher in by segmenting prior to modeling.

Lessons that we Learned from the Netflix Prize (Istvan Pilaszy, Gravity R&D)

This guy was a pure genius. I really did not understand most of the presentation because the equations were over my head. It would take me a lot more time to go through each equation to truly understand the details of the results. However, I do know that ensembles were used along with a weighting strategy to more accurately predict the movies that customers would like to view. Below is my favorite equation from this presentation.

Day 4: Full Day Workshop – The Best and the Worst of Predictive Analytics: Predictive Modeling Methods and Common Data Mining Mistakes (John Elder, Ph.D.)

I can’t say enough about how good John Elder’s workshop was. Please do read through his presentation. I copied a few of his slides here just to give a little view into the concepts he covered. Below is a great pictorial that describes how different algorithms attempt to fit a dataset. We talked about the advantages/disadvantages of each of these models.

The next two slides show the power of ensembles. First we looked at a slide to see how algorithms performed over several datasets. Below you can see that neural networks was probably the best overall performer, but there is none of the algorithms were the best for every dataset.

Next we looked at the impact of ensembles. Each algorithm was included in an ensemble (multiple algorithms as opposed to boostraped). Different methods were used to combine the results. Look at the huge improvement. Clearly ensembles need to be considered in any modeling process.

Another view of and ensembel of trees resulting in higher uplift than any of the individual trees could offer.

John Elder is one of the author of Handbook of Statistical Analysis and Data Mining Applications. I have yet to finish the book, but it is going to be on my desktop as a reference for a long time to come. He combines statistics, data mining, and applications to give a uniquely complete view of the modeling process. John has not lost sight of the ROI goal that DM should offer and is a great speaker.

Overall this workshop was a great. We hit on many topics some of which could take days to get a deep understanding. I’d recommend John’s workshops to anyone interested in learning the nuts and bolts of data mining and hope to attend more of his sessions in the future.

Conclusion

Predictive Analytics World was a great success. I learned a lot and feel like I at least know what the industry is doing in the DM space. LiveLogic has some key advantages as we currently have a high level of expertise in handling large amounts of data as well as in-database transformations. We also have a very nice platform to work with in SQL Server. Our focus should be on improving statistical knowledge, algorithm usage, modeling experience, and in building our own case studies that we can leverage.

BiVal

Friday, October 30, 2009

PAW 2009 (Alexandria, VA) - Conference Summary

Monday, October 5, 2009

Predictive Analytics World 2009

Sunday, October 4, 2009

Gartner CRM Summit