Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
    pexels pavel danilyuk 8112119
    Data Analytics Is Revolutionizing Medical Credentialing
    8 Min Read
    data and seo
    Maximize SEO Success with Powerful Data Analytics Insights
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Analytics Ascendant, Part 2: The Limits of Predictive Modeling
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Analytics > Predictive Analytics > Analytics Ascendant, Part 2: The Limits of Predictive Modeling
Business IntelligencePredictive Analytics

Analytics Ascendant, Part 2: The Limits of Predictive Modeling

DavidBakken
DavidBakken
10 Min Read
SHARE
In my last post on predictive modeling (4 August 2009) I used the recent announcement that the Netflix Prize appears to have been won to make two points. First, predictive modeling based on huge amounts of consumer/customer data is becoming more important and more prevalent throughout business (and other aspects of life as well). Second, the power of predictive modeling to deliver improved results may seduce us into believing that just because we can predict something, we understand it.

Perhaps because it fuses popular culture with predictive modeling, Cinematch (Netflix’ recommendation engine) seemed like a good example to use in making these points. For one thing, if predicting movie viewers’ preferences were easy, the motion picture industry would probably have figured out how to do it at some early stage in the production process–not that they haven’t tried. A recent approach uses neural network modeling to predict box office success from the characteristics of the screenplay (you can read Malcom Gladwell’s article in The New Yorker titled “The Formula” for a narrative account of this effort). The market is segmented by product differentiation (e.g., genres) as well as …

In my last post on predictive modeling (4 August 2009) I used the recent announcement that the Netflix Prize appears to have been won to make two points. First, predictive modeling based on huge amounts of consumer/customer data is becoming more important and more prevalent throughout business (and other aspects of life as well). Second, the power of predictive modeling to deliver improved results may seduce us into believing that just because we can predict something, we understand it.

Perhaps because it fuses popular culture with predictive modeling, Cinematch (Netflix’ recommendation engine) seemed like a good example to use in making these points. For one thing, if predicting movie viewers’ preferences were easy, the motion picture industry would probably have figured out how to do it at some early stage in the production process–not that they haven’t tried. A recent approach uses neural network modeling to predict box office success from the characteristics of the screenplay (you can read Malcom Gladwell’s article in The New Yorker titled “The Formula” for a narrative account of this effort). The market is segmented by product differentiation (e.g., genres) as well as preferences. At the same time, moviegoers’ preferences are somewhat fluid, and there is a lot of “cross-over” with fans of foreign and independent films also flocking to the most Hollywood of blockbuster films.

This brings to mind a paradox of predictive modeling (PM). PM can work pretty well in the aggregate (and perhaps allowing Netflix to do a good job of estimating demand for different titles in the backlist) but not so well when it comes to predicting a given individual’s preferences. I tend to be reminded of this every time I look at the list of movies that Cinematch predicts that I’ll love. For each recommended film, there’s a list of one or more other films that form the basis for the recommendation. I’m struck by the often wide disparities between the recommended film and the films that led to the recommendation. One example: Cinematch recommended “Little Miss Sunshine” (my predicted rating is 4.9, compared to an average of 3.9) because I also liked “There Will Be Blood,” “Fargo,” and “Syriana.” It would be hard to find three films more different than “Little Miss Sunshine.” “Mostly Martha” is another example. This is a German film in the “foreign romance” genre that was remade as “No Reservations” in the U.S. with Catherine Zeta-Jones. Cinematch based its recommendation on the fact that I liked “The Station Agent.” These two films have almost no objective elements in common. They are in different languages, set in different countries, with very different story lines, cast and so forth. But they share many subjective elements (great acting, characters you care about, and humor, among others) and it’s easy to imagine that someone who likes one of these will enjoy the other. On the other hand, Cinematch made a lot of strange recommendations (such as “Amelie,” a French romantic comedy) based on the fact that I enjoyed Gandhi, the Oscar-winning 1982 biopic that starred Ben Kingsley.

Like all predictive models, Cinematch takes the available information and makes a best guess. According to the Netflix Prize FAQs, Cinematch does it with “straightforward statistical models with a lot of data conditioning.” Netflix also uses some additional data sources (beyond the simple ratings).

What the recommendations reveal is that there are enough people in the database who rated pairs of movies that the algorithm can make a reasonable guess about how much I’ll enjoy one movie given that I’ve enjoyed the other. The result is, conceptually at least, a huge hyper-dimensional matrix of conditional probabilities. Cinematch does not provide me with a measure of its confidence in the prediction, but I can infer it by comparing my predicted rating with the average member rating. I’m guessing that the bigger the difference, the more information Cinematch had available.

The point of collaborative filtering systems is to match people (especially online shoppers) who appear to be similar in their preferences. Amazon can do it by looking at wish-lists, browsing, and purchasing behavior. Cinematch does it a little more systematically by collecting ratings. It’s sort of like automated word-of-mouth, and as I noted in my previous post, this is crucial to driving a long-tail business based on experience goods.

Back to the Netflix Prize for a moment. Clive Thompson, in The New York Times Sunday Magazine (”If You Liked This, You’re Sure to Love That,” 23 November 2008), reports that a major breakthrough occurred very early in the competition. A team called “Simon Funk” jumped nearly 40% of the way toward the 10% improvement goal by using singular value decomposition. SVD is a data reduction method similar to (or one or) the factor-analytic methods commonly used in marketing research. In effect, SVD creates clusters of films based on the correlations in the ratings given to those films. Depending on the components of the clustering, SVD (like factor analysis) might reveal reasons for the clustering (if, for example, romantic comedies have strong positive loadings on one component, and action adventure films have strong negative loadings on the same component). Using something like SVD addresses some of the challenges presented by the Cinematch data. With SVD, the predictive algorithm can work with a correlation matrix rather than millions of individual comparisons yet keep all of the information contained in those comparisons. That makes it easier to “borrow” information to fill in the gaps in ratings data, which is particularly useful in cases where you have wide variation in the object sets rated by each Netflix member.

What’s missing from Netflix right now–and maybe the winning solutions incorporate this–is a way to account for differences in individuals. When I contemplated using hierarchical Bayesian modeling for this exercise, I had in mind inferring relevant characteristics about each individual from patterns in their ratings. A Bayesian approach has several potential advantages, such as taking into account the limited (and potentially biased) set of observations that Cinematch collects from each member. Of course, Netflix could collect additional information from members in the form of a survey and incorporate that (perhaps via segmentation analysis) additional information that would help them make recommendations.

For me, the ideal predictive model is one that makes the underlying behavioral model explicit, meaning that the predictive model incorporates variables that have a causal relationship to the outcome of interest. In addition, an ideal model will account for heterogeneity across consumers. The starting point in developing an ideal predictive model is mapping out the underlying behavioral model. Then you can hang data on the proposed model and start crunching.

Copyright 2009 by David G. Bakken.  All rights reserved.

TAGGED:netflix
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

image fx (2)
Monitoring Data Without Turning into Big Brother
Big Data Exclusive
image fx (71)
The Power of AI for Personalization in Email
Artificial Intelligence Exclusive Marketing
image fx (67)
Improving LinkedIn Ad Strategies with Data Analytics
Analytics Big Data Exclusive Software
big data and remote work
Data Helps Speech-Language Pathologists Deliver Better Results
Analytics Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

The Netflix Prize: Customer Intelligence for Hire

5 Min Read

Data mining competition: HIV progression

2 Min Read

The Promise and Perils of Text Analytics — Privacy

4 Min Read
big data netflix
Big Data

Big Data: How Netflix Uses It to Drive Business Success

9 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive
giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?