Predictive analytics is a method to study the past and, using a combination of sophisticated math and creative visual communication, predict the future.
Predictive analytics is a method to study the past and, using a combination of sophisticated math and creative visual communication, predict the future. The goal of predictive analytics strategies in a business is to predict future trends that will affect the company’s bottom line, and use that information to make better decisions. The business that can do this better than their competitors wins cost reductions, revenue increases and happy stock holders. So, in this race for knowledge, how does one company get the checkered flag?
With all the talk about big data, you’d think the answer would be, “analyze huge amounts of data,” but no. You can analyze Twitter from the beginning, all the sensor data from every meter in the world, and the genome of every human being in Europe, and not get ahead in business. You might make a lot of data geeks go, “Wow!” but big data is, in the end, just data. No matter how big or small it is, analysis of data is only important if it’s relevant to the decisions you need to make.
One thing that gets lost in the excitement over the huge deposits of untapped data, “the new oil,” is that many companies aren’t getting full value out of the data they already have. Still, these new data sources do have hidden value for businesses trying to get ahead. And because of that, a massive technology effort has gone into creating new ways to process data. Strategies developed to tap into massive data sources can also be used to improve predictive analytics success on data sets that we don’t generally think of as “big data.” This technology is like a turbo-charger or a nitrous oxide boost for predictive analytics processes.
Data science is no different from any other science in many ways. There are a few ways to make leaps forward. Lucky accidents happen. If a sharp mind interprets anomalies accurately, that can give us some amazing advances. Sometimes, new insights are simply a new way of thinking about something that is already known. But in the vast majority of cases, most advances in any knowledge area come from focused experimentation and iteration, repeated many times.
“I have not failed. I’ve just found 10,000 ways that won’t work.“
– Thomas Edison
The key to better predictive analytics lies in facilitating that natural course of knowledge advancement. Any strategy that lets data analysts design predictive analytic models faster, test them faster, tweak them faster, iterate and refine them faster, will give them an edge over their slower moving colleagues. The data scientists who can test 100 or 1000 predictive models in the time that it takes a competitor to test 10 will win clearer, more accurate forecasts for their company.
People forget that standard analytic data sets are far from “small.” For years, data analysts have been making compromises to keep getting as much business value as they can with their limited technology resources. Most analytic algorithms are applied to a sample of an aggregate of a sample, a tiny percentage of the actual data available. Instead of working with tiny samples, and having to figure out which columns or aggregations of data they might need before even beginning to work, that processing power can let analysts look at the entire data set, truly see the big picture in normal data sets that may not qualify as “big data.” Multiple data sets combined together can give even more valuable business insights. The ensemble modeling that many data analysts dream of could now be within reach, thanks to advances in processing motivated by the big data revolution.
The mental shift that needs to be made is to realize that big data analytics strategies don’t have to only apply to Google- or Facebook-sized data sets. Using the same technology, data sets that used to seem huge, become far more manageable. Technology that allows you to crunch through a billion rows in 10 seconds, can make analytics algorithms scream on your mere 100 million row data set. Big data technologies applied to normal analytics workloads become predictive analytics accelerators that can zoom a business right past competitors.
Mike Hoskins, the GM of the Pervasive Big Data & Analytics division, did a quick lightning talk on the subject of strategies for out-predicting the competition. Here’s what he had to say about it:
(Originally posted on Pervasive Big Data Blog)