Conversation with Eric Siegel on Predictive Analytics World

The Predictive Analytics World Conference is taking place Feb 18-19, 2009 in San Francisco, CA, and seems to have an interesting lineup of speakers (including one of the winners of this years Netflix Progress Prize). I’m going to be in the bay area during the week of Feb 15th, so I’m planning on checking out some of the talks. Data Wrangling readers can register using this code: datawranglingpaw09 and get 15% off the conference registration fee. Drop me a line if you are attending and want to meet up.

It also might be worth stopping by if you are an R user, as Mike E. Driscoll at Data Evolution mentioned:

The Bay Area R UseRs group is doing a free, co-located event on Wed evening of the conference — so if you’re interested in mingling with some PAW folks as well as some R users — you can sign up at: http://ia.meetup.com/67/calendar/9573566/

The organizers of the conference are coordinating a nice media blitz across several machine learning blogs; check out the post by Brendan O’Connor and John Langford’s interview at Machine Learning (Theory). I thought I would join in the fun by interviewing Eric about a few topics related to the conference, mostly focusing on customer modeling a…

It also might be worth stopping by if you are an R user, as Mike E. Driscoll at Data Evolution mentioned:

The Bay Area R UseRs group is doing a free, co-located event on Wed evening of the conference — so if you’re interested in mingling with some PAW folks as well as some R users — you can sign up at: http://ia.meetup.com/67/calendar/9573566/

Read on for the transcript of our email interview:

Eric Siegel, Ph.D., is the conference chair of Predictive Analytics World, coming to San Francicso Feb 18-19 – the event for predictive analytics professionals, managers and commercial practitioners. This conference delivers case studies, expertise and resources in order to strengthen the business impact delivered by predictive analytics.

Pete: Can you give readers an overview of your background?

Eric: I’ve been in data mining for 16 years and commercially applying predictive analytics with Prediction Impact since 2003. As aprofessor at Columbia University, I taught graduate courses in predictive modeling (referred to as “machine learning” atuniversities), and have continued to lead training seminars inpredictive analytics as part of my consulting career.

There appear to be a few talks on the schedule related tothe Netflix Prize. During the first year of the competition I found it surprising how effective ensemble approaches turned out to be, with multiple teams often pooling a number of algorithms together to improve accuracy. Are you seeing similar ensemble methods used in practice more often now?

Netflix is addressed during at least three session of PAW, including a presentation by the current Netflix leader (and winner of the 2008 Progress Prize).

I do see ensemble models as key to many successful deployments of predictive analytics, both internally (my firm, Prediction Impact) and otherwise. In fact, the Netflix leader uses a “mini-” ensemble, combining two competing methods with a meta-model, which turned out to be key to their success. The only reason team “BellKor in BigChaos” did well enough to qualify for the 2008 Progress Prize was by combining the two approaches developed by the separate teams “BigChaos” and“BellKor”, who had operated completely separately and developed different methods (in Austria and the U.S., respectively). The method to combine models is called “meta-learning” or “ensemble models”. The “meta-model” learned which of the two methods were better at which kinds of cases, and weights their outputs accordingly.

Ensembles are also covered at PAW-09 outside of the Netflix leader’ssession. John Elder, who published a very interesting article (now a book chapter), “The Generalization Paradox of Ensembles”, will speak on the topic during his workshop, “The Best and the Worst of Predictive Analytics: Predictive Modeling Methods and Common Data Mining Mistakes”. Dr. Elder is also speaking on case studies during a regular conference session. And Dean Abbott’s session covers a case study of his work for the NRA employing ensemble models.

Peter Norvig gave a talk a couple of years ago, “Theorizing from Data: Avoiding the Capital Mistake”, which discussed how relatively simple statistical approaches could outperform more complex algorithms given large enough datasets. Use of the Mapreduce approach is often pointed to as a key factor enabling companies like Google and Yahoo to tackle problems like this. Lately, I’m hearing more interest from enterprise clients about using Hadoopand cloud computing to analyze large internal datasets in a similar manner. Do you think this kind of large scale parallel data mining will see increased adoption in traditional enterprises? Or will this approach be limited to a handful of large organizations?

Well, even with simple models, there can be a benefit to parallelizing the learning process. On the other hand, even with amillion training cases, one reasonably modern desktop can be enough. Even if most organizations don’t need to go parallel, this is one of those “raise the floor by raising the ceiling” kind of situations – it’s important to keep this moving and push to new heights.

By the way, the advantage of simpler approaches relates to the article by Dr. Elder I mentioned above. An ensemble is by definition amore complex model, yet it is less inclined to succumb to the central risk of higher complexity: overadapting/overfitting to the data.

In the past, I worked with offline retailers on predictive modeling of price elasticity and promotional effectiveness. Back then, we were often limited to analysis of historical price and sales POS data, without the capability to conduct live A/B testing in physicalstore locations. Over the last few years, sites like Amazon and Google have taken advantage of live A/B testing for price and advertising optimization on the web. In your experience, is this experimental approach to decision making becoming more common in traditional enterprises or is it still limited to the web?

You know, I know of few to no public case studies of offline (non-web) businesses employing online/realtime learning. Generally, it isn’t worth the integration requirements and increased analytical challenges – companies usually just refresh the predictive model periodically with the usual offline process of learning/modeling overupdated data.

Earlier this month, the O’Reilly Money:Tech Conference was postponed in light of the financial crisis. Have you seen much fallout in the predictive analytics community from the economic downturn?

It is hard to generalize. People are more risk adverse in employing a technology to which they may be new, but, at the same time, there’s more attention on improving efficiency with better decisions or more precisely targeted marketing. My keynote at PAW-09 is entitled, “Five Ways to Lower Costs with Predictive Analytics”. And I would say my consulting colleagues and I are as busy as usual.

You may think conferences are another thing, since attendance/travel are often the first costs cut. Well, I’ve got a bias, but my opinion that predictive analytics is too important to circumvent is founded: registration for PAW-09 has surpassed the goals we set last spring. I credit in large part predictive analytics’ increase in buzz andattention; it’s starting to get its due.

What problem areas appear to be most in demand by enterprise customers right now? Credit scoring, recommendation systems, predictive text mining?

Credit scoring is established – an application of predictive analytics that has crossed the chasm and achieved wide adoption. It is fairly specialized, almost a field unto itself. Product recommendations is certainly on it’s way up, as vendors rendor it more accessible tocompanies that aren’t an Amazon or Netflix. Augmenting predictive analytics with text mining has great potential, but I haven’t seen enough case studies to get a feel of how pervasive it’s become – at least not yet.

Human resource applications, including human capital retention, are an up-and-coming contrast to marketing applications — predict which employee will quit rather than the more standard prediction of which customer will defect.

Beyond those, I consider the following the remaining hot areas (all represented by named case studies at PAW-09, by the way):

Marketing and CRM (both offline and online)
- Response modeling
- Customer retention with churn modeling
Online marketing optimization
- Behavior-based advertising
- Email targeting
- Website content optimization
Insurance pricing

Link to Blog

You Might also Like

Data Collection 101: Where Is Your Data Coming From?

It’s Time to Take Your Data’s Temperature

How Data Hoarding Is Costing Your Business

It’s called Collision Warning with Brake Support, and it…