A Different Strategy for Solvable Problems in Big Data Predictive Analytics

September 10, 2012
118 Views

I had a long conversation with James Taylor of Decision Management Solutions the other day. One of the things that he talked about struck me as a new way of thinking. In the field of big data predictive analytics, there are really two different types of problems that businesses face.

I had a long conversation with James Taylor of Decision Management Solutions the other day. One of the things that he talked about struck me as a new way of thinking. In the field of big data predictive analytics, there are really two different types of problems that businesses face.

One is a problem that simply has, as yet, no solution. Algorithms won’t scale up sufficiently, access to the right data just isn’t there, or the level of complexity is too high as yet to tackle.  Maybe it requires correlating data from too many different streams, or maybe the data types are widely varying and can’t be reconciled.

These problems are currently not solvable. As technology progresses, and human ingenuity finds ways, they may eventually become solvable, but for the moment, they’re not ones a company can do anything about.

Then, there’s the other kind of predictive analytics problem: A solution is known, but financially unattainable. The business knows how to get the answers they need.  They have access to the data, and they know what first essential questions they need answered. They even see the ROI inherent in the project. However, the initial investment, particularly in hardware, is much too high for the project to be feasible.

If someone walks into a meeting with a CIO with a brilliant big data predictive analytics solution, including a convincing presentation of significant ROI, and says, “All you have to do is buy a million dollars’ worth of hardware,” chances are high it’s not going to happen.  That kind of project just isn’t practical in today’s economy.

I hadn’t really thought of dividing the challenges in this way before. A later conversation brought up an additional problem type. There’s a subset of the solvable kind of problem, where the analytics model exists. The answers are available, but they’re highly time sensitive. Currently available technology can’t get the answers to the people who need them fast enough to be useful.

A Web search engine that returned answers in five minutes would be worthless. A coupon or recommender system has to work within seconds, or forget it. Again, problems of this type could be solved by throwing tons more hardware and money at it, but that’s not realistic in many cases.

Therefore, businesses think of these solvable problems as hardware problems, or economic problems.

There’s one aspect of the challenge that is so much a part of life that we rarely think about it. The standard utilization rates of hardware are around 15% worldwide. The top-rated IT teams in the world push that to 20%. I did a blog post about data center utilization a few weeks back that covers this in a lot more detail. Essentially, that “15% utilization” means that in most businesses in the world, about 85% of hardware capacity sits idle almost all the time.  In even the best, most efficient data centers in the world, 80% of hardware capacity sits idle.

Imagine that hardware were people. If your business had a manager who hired 100 people, but on average, only had 15 of them working at a time while the other 85 snoozed on the job, you’d fire that guy in a heartbeat. No business would ever tolerate that level of waste.

Big Data Predictive Analytics Snoozing on the Job

Existing hardware, by and large, is already providing businesses with the capacity needed to get the job done. The hardware is giving us the 100 people (CPU’s) to do the job. Software is the manager that puts the hardware resources to work. If the software is only using an average of 15% of the available resources, maybe it’s time to fire the software.

Many of the big data predictive analytics problems that seem unreachable because the hardware investment is too high would become solvable if we made better use of the hardware available. This requires a mental shift on the part of businesses to realize that these aren’t really hardware problems, or financial problems. They’re software problems.

What if predictive analytics software could bring its hardware usage percentage up to 80%? Or 60%? What if software could improve hardware utilization on analytics jobs just to 50%? If software could use even half of the existing compute power to its fullest, most of these problems would be solved.

That project that would have required a million dollars’ worth of hardware could be done with less than a quarter million dollars’ worth of hardware. Or better yet, it could be done with the hardware a company already has that they’re not fully utilizing now.

Investing in more hardware has always seemed like the only way to tackle these problems.  Investing in better utilization of existing hardware is a far better, more sustainable, and cost-effective solution.

You don’t need more hardware to solve many big data predictive analytics problems, you need software that can put to work what you already have.