Legendary Hollywood screenwriter William Goldman said “Nobody, nobody – not now, not ever – knows the least goddam thing about what is or isn’t going to work at the box office.” He was speaking before the arrival of the internet and Big Data, and since then, the streaming movie and TV service Netflix has based its business model on attempting to prove him wrong. Netflix is said to account for one third of peak-time internet traffic in the US. Last year it announced that it had signed up 50 million subscribers around the world. Data from all of them is collected and monitored in an attempt to understand our viewing habits. But its data isn’t just “big” in the literal sense. It is the combination of this data with cutting edge analytical techniques that makes Netflix a true Big Data company. A quick glance at its jobs page is enough to give you an idea of how seriously data and analytics is taken. Specialists are recruited to join teams specifically skilled in applying analytical skills to particular business areas – personalization analytics, messaging analytics, content delivery analytics, device analytics… the list goes on. However although Big Data is used across every aspects of the Netflix business, their Holy Grail has always been to predict what its customers will enjoy watching. Big Data analytics is the fuel that fires the “recommendation engines” designed to serve this purpose.
Predicting viewing habits
Efforts here began back in 2006, when the company was still primarily a DVD-mailing business (streaming began a year later). It launched the Netflix prize, offering $1 million to the group which could come up with the best algorithm for predicting how its customers would rate a movie based on their previous ratings. The winning entry was finally announced in 2009 and although the algorithms are constantly revised and added to, the principles are still a key element of the recommendation engine. At first, analysts were limited by the lack of information they had on their customers – only four data points (customer ID, movie ID, rating and the date that the movie was watched) were available for analysis. As soon as streaming became the primary deliver method, many new data points on their customers became accessible. Data such as time of day that movies are watched, time spent selecting movies and how often playback was stopped (either by the user or due to network limitations) all became measurable. Effects that this had on viewers’ enjoyment (based on ratings given to movies) could be observed, and models built to predict the “perfect storm” situation of customers consistently being served with movies they will enjoy. Happy customers, after all, are far more likely to continue their subscriptions. Another central element to Netflix’s attempt to give us films we will enjoy is tagging. It pays people to watch movies and then tag them with elements that the movies contain. It will then suggest you watch other productions which were tagged similarly to those which you enjoyed. This is where the sometimes unusual (and slightly robotic-sounding) “suggestions” come from – (“In the mood for wacky teen comedy featuring a strong female lead?”) It’s also the reason that the service will sometimes (in fact in my experience, often!) recommend I watch films which have been rated with only one or two stars. This may seem like it is counterintuitive to its objective of showing me films I will enjoy. But what has happened is that the weighting of these ratings has been outweighed by the prediction that the content of the movie will appeal. This article explains the science behind that process in a bit more depth, and how Netflix has effectively defined nearly 80,000 new “mirogenres” of movie based on our viewing habits!
Finding the next smash-hit series
More recently, Netflix has moved towards positioning itself as a content creator, not just a distribution method for movie studios and other networks. Its strategy here has also been firmly driven by its data – which showed that its subscribers had a voracious appetite for content directed by David Fincher and starring Kevin Spacey. After outbidding networks including HBO and ABC for the rights to House of Cards, it was so confident that it fitted its predictive model for the “perfect TV show” that is bucked convention of producing a pilot, and immediately commissioned two seasons comprising of 26 episodes. Every aspect of the production under the control of Netflix was informed by data – this Wired article explains how even the range of colors used on the cover image for the series was selected to draw viewers in. The ultimate metric which Netflix hopes to improve in the number of hours that customers spend using its service. You don’t really need statistics to tell you that viewers who don’t spend much time using the service are likely to feel they aren’t getting value for money from their subscriptions, and there is a danger they will cancel their subscriptions.
Quality of experience
To this end, the way that various factors affect the “quality of experience” is closely monitored and models are built to explore how this affects user behavior. Although its vast database of movies and TV shows is hosted internally on its own distributed network of servers, it is also mirrored around the world by ISPs and other hosts. As well as improving user experience by reducing lag when streaming content around the globe, this reduces costs for the ISPs – saving them from the cost of downloading the data from Netflix server before passing it on to the viewers at home. By collecting end-user data on how the physical location of the content affects the viewer’s experience, calculations about the placement of data can be made to ensure an optimal service to as many homes as possible. Data points such a delays due to buffering (rebuffer rate) and bitrate (which affects the picture quality – if you’re watching a film on Netflix that suddenly seems to switch from razor-sharp HD to a blurry mess, you’ve experienced a bitrate drop) are collected to inform this analysis. Netflix has used Big Data and analytics to position itself as the clear leader of the pack. It has done this by taking on other distribution and production networks at their own game, and trumping them through innovative and constantly evolving use of data. It faces competition now and into the future of course – most notably Amazon, which purchased UK-based Netflix rival Lovefilm in 2011, acquiring its user data and melding it with their own advanced data infrastructure and analysis platforms. Will Amazon – which pioneered the art of recommendations before Netflix even existed – unseat Netflix from its perch as king of the streaming content providers? Not to mention that Apple is about to launch it’s new AppleTV service to compete in this space. Time will tell – but the race to develop more accurate and insightful analytic strategies will be a key decider.