Driving Analytic Value From New Data
One of the best ways to improve the power of your analytics is to include some totally new information. The use of new information can enable huge leaps in the effectiveness, predictive power, and accuracy of your analytics. Most of the time, effort is spent trying to incrementally improve results by using existing data and information in a more effective manner. This isn’t as much because analytic professionals don’t realize that new data can be powerful as it is because new data only becomes available occasionally. As soon as a new and different data source is available, however, you’ll be much better off to shift your focus to the new data immediately.
To me, this gets to the heart of why big data is so powerful and is getting so much attention. I believe that the volume, variety, and velocity aspects of big data, which get so much attention, are secondary. As I have discussed in prior blogs and articles, the most important ‘V’ associated with big data is value. The other ‘V’s’ are only relevant in the presence of value. So what drives that value for big data? Keep reading.
The fact is that many big data sources contain information that was either not available in the past, or was available only to a much lesser extent through means requiring much more effort. For example, information from your web browsing activity is easy to capture and analyze today. In the past, the only way to get similar data was through very expensive research projects executed on a very small scale. In practice, the information just wasn’t available because it was too expensive.
Let’s fast forward to an analytic professional attempting to address a common business problem today, such as churn or next best offer. When the data sources available are fixed, most effort goes into trying new modeling methods, new variable definitions, and new ways to handle sparse or missing data. These efforts can result in increased power, but typically only provide small, incremental gains. In cases with a lot of money on the line, such gains aren’t anything to sneeze at. However, the fact is that the likelihood of blowing your last results out of the water is pretty low.
Now let’s imagine that the same analytic professional uses the exact same modeling methods, variable definitions, and data preparation today as he or she used yesterday. However, added into the analysis are new variables from a new data source that contains totally new information. Let’s assume that browsing history is now available to help identify customers’ next best offer, for example. Given that browsing history provides information on preferences and future purchase intent that isn’t available with traditional data sources, the analytic professional can achieve tremendous gains in analytic power. This is true even when using the same old methods, but with new data.
My point is that for all the fuss about what the best analysis methods are and how to best handle missing and dirty data, the really big gains come from finding new information to include. Think back to statistics 101 and the idea of Principal Components Analysis and orthogonal vectors. While dozens of variables may be available to an analysis, the variables often contain widely overlapping information. A new variable with substantially the same information as is already known won’t add much value. However, anytime you can add variables that are completely or mostly distinct in terms of the information contained, there is the potential for a lot of value.
The action I recommend for readers is to constantly seek out new data sources. Instead of putting all your effort into tuning your existing modeling methods with existing data, focus effort on a new data source every chance you get. That’s where you’ll find the big gains. After you realize your initial gains from the new data you can go back to tuning, but I believe that makes sense only when you’ve exhausted your ability to include additional data sources.
This is the core of the value proposition for big data. Many organizations suddenly have multiple new, untested sets of data available for incorporation into their analytic processes. Used correctly, this data can provide a huge competitive advantage and a veritable gold mine of value. Don’t miss your chance to get ahead.
Let’s close with a thought experiment. Assume I offer you a world class analytic professional with access to every tool available, but who will be limited to using only existing data. Your other option is a solid, but not world class, analytic professional with access to just standard tools. This person, however, will be allowed to incorporate some new data sources that appear to hold value.
I hope you’ll take the 2nd option over the 1st. Ideally, you’ll have a world class analytic professional working with the new data, of course, but the thought experiment illustrates the point. No matter how good an analytic professional is and how fancy the tools, the inherent value in new and different data will win in most cases.
To see a video version of this blog, visit my YouTube channel.
Originally published by the International Institute for Analytics
Bill Franks is Chief Analytics Officer for Teradata, providing insight on trends in the analytics & big data space and helping clients understand how Teradata and its analytic partners can support their efforts. In addition, Bill is a faculty member of the International Institute for Analytics and the author of the books Taming The Big Data Tidal Wave (John Wiley & Sons, Inc., April, 2012) ...