Big Data Analytics Versus Your Own Lying Eyes
If making it in Hollywood means you have arrived, then data analytics has arrived. Only a few months after social networking hit the big screen, analytics made its debut with Moneyball, belting out an opening weekend take of $20 million.
A subject as abstract and mathematical as analytics might not seem like the stuff of a major motion picture. But Moneyball is also a baseball movie, and therein lies a tale, because baseball and statistics go together ball in glove.
A Game of Stats
As noted by Grant Brisbee at Baseball Nation, baseball statistics have been around for a long, long time. Generations of kids have learned how to compute batting averages and earned run averages. Depending on how you rank some shortlived 19th century leagues, the majors have played about 200,000 games, with nearly 14 million official at-bats. These have yielded some 3.5 million hits, more than a quarter-million home runs – and exactly 20 perfect games.
Even by modern standards this is Big Data. Yet it took many decades for the full power of all this information to be discovered and put to work. Moneyball tells the story of how it finally happened. Bill James and other statistics-minded fans asked a deceptively simple question: What – and who – really contributes the most to a winning baseball season?
“Baseball people” – managers, scouts, and others closely involved in the game – thought they knew, based on years of observation. But when analytics was applied to baseball stats, allowing a deeper look into the data, much of the conventional wisdom turned out to be not quite right.
The Math of Subtle Distinctions
We humans aren’t very good at statistics. We notice the dramatic – home runs, double plays, strikeouts. We are not so good at observing fine distinctions over time.
Is it better to have a high batting average, hitting mostly singles, or to have a slightly lower average that includes a sprinkling of doubles? Even experienced baseball observers could not be quite sure. The goal of sabermetrics – as the analytics of baseball was dubbed – was and is to tease out the subtleties that are hard to notice. As a key line in Moneyball puts it, “we aren’t allowing ourselves to be victimized by what we see.”
Moreover, the time-honored traditional stats could trick observers into taking their eyes off the statistical ball. The traditional batting average, for example, is so familiar and so straightforward that it generates a bias. A .370 hitter must be better than a .350 hitter – the difference is right there in front of your eyes. Except that it isn’t.
According to Gary Cokins, writing at SmartDataCollective, the weight of tradition can be hard to break through. The first major league team to put sabermetrics to work was the Oakland As. Working within a limited budget, the team went looking for players who were undervalued – whose contributions to winning games were missed by baseball traditionalists, but revealed by sabermetric analytics.
And it worked. The Oakland A’s didn’t shoot straight to World Series glory, but the team moved solidly up in the standings – and other teams began paying attention. Analytics has now become a familiar part of baseball lore, its new stats reported on TV and debated among fans. The game itself has been changed. The changes may be subtle, but subtlety is what analytics is all about.
What subtleties might business intelligence and predictive analytics uncover to send your team into the postseason? If you are attending tomorrow’s TUCON 2011 conference you just may find out. If not, you can follow the conference hashtag TUCON or subscribe to this blog to stay informed.
Other Posts by Brett Stupakevich
The moderated business community for business intelligence, predictive analytics, and data professionals.