A few months back, I was presenting with a friend at a Chief Data Officer summit in Dallas, and my co-presenter put up a slide that said, “60 % of all big data analytics projects fail.” Someone in the audience asked, “Why do they fail?” My friend said, “I think Paige could answer that better than I could.”
Put on the spot, three reasons that have been confirmed from multiple sources jumped immediately into my head. I used those three to answer the question. But later, when I had time to think, I realized there was one other reason that shows up repeatedly, but often gets downplayed or written off as not the REAL problem, when in my opinion, it very much is.
Lack of a Business Use Case
The one problem that especially plagued a lot of the early Hadoop projects, and still pops up now and then, is that setting up a big data analytics platform is seen as an end in itself. Archery is one of my hobbies, but I have yet to hit a bulls-eye when there’s no target to shoot at. Without any clear understanding of pain in the organization that Hadoop might address, a cluster and its associated technologies are set up as a sort of science experiment. Without a problem in mind, the choice of software is vague, the project is directionless, and the criteria for success is undefined. The likelihood that the project will be declared a success under those conditions is virtually nill. Unless a particularly savvy project manager stumbles on a business use case, there just isn’t any way to call that kind of project a win.
This problem is the opposite of the lack of an end goal, it’s shooting for the moon. According to the big data hype machine, big data analytics will double your profits, save your company millions and get that stubborn stain out of your shirt collar. Big data will save the whales and cure cancer and make everyone rich. With the bar set that high, nothing short of a magic wand would actually hit the mark. Even projects that would be considered wildly successful elsewhere are counted as failures in this situation.
There’s another variation on this when every person involved in a project has a different expectation of what value will come from the project. In this case, there isn’t one target, there are a dozen, all in different directions. Maybe the CMO is expecting big increases in marketing campaign response. Maybe the CFO is expecting a big boost in revenue. Maybe the CIO is expecting a big cut in IT infrastructure costs. Everyone expects the project to impact their particular aspect of the business. Expectations like that can lead to scope creep on an epic scale. In the end, trying to do everything at once can end up accomplishing nothing.
Shortage of Skills
Even if you have a single, sensible target to shoot at, you can’t hit the target if you don’t know how to string the bow. It’s practically part of the definition of “big data analytics” that this sort of project can’t be done economically with traditional, mature software stacks. Hadoop and its ecosystem of software is maturing rapidly, filling in gaps in functionality, but graphical interfaces are still a rare treat, not the norm. To accomplish a full end-to-end project often requires a patchwork quilt of 20 different pieces of software, each with its own idiosyncratic programming, design, and integration requirements.
And we’re not even talking about the intricacies of simply getting a bunch of servers to work together, dealing with inevitable hardware failures, and integrating all this strange new software into the larger enterprise of traditional old stodgy software. Assembling that jigsaw of software and hardware pieces into a beautiful picture of analytic usefulness requires skill and hard work. While hard workers are not in short supply, workers with the skills needed for this level of complexity are. Impactful big data analytics projects are not easy to implement. Anyone who tells you different is selling something.
Building and integrating a valuable, useful big data analytics project requires not just skills and a reasonable goal, but time. While projects like this don’t get launched without support from someone in the business, people change jobs, get promoted, move around, and over time, attitudes can change. If one person was perceived as having ownership of the project, and that person leaves the company before it’s complete, what happens to the project? If one department wants the project desperately, but another sees it as a huge waste of resources, how quickly will that second department pounce when the project hits a snag? With people scrambling to protect their jobs, no one wants to be associated with a “failed” project.
This is the problem that didn’t occur to me when I was asked this question with a room full of Chief Data Officers staring me down, but it should have. It’s the problem that I have seen stymie good projects time and time again. I personally believe it is the biggest stumbling block to any big data analytics project, and possibly to any large business project of any kind.
But Did They Really Fail?
Here’s the thing that bothered me at the time, and that has left this nagging at the back of my mind for months. Did all of those projects really fail? I don’t know where my co-presenter got that slide, but I assume it was a result from a survey probably conducted by a prestigious industry analyst firm. So, without doubt, 60% of firms that answered that survey considered their big data analytics projects to be failures.
The thing about shooting a bow is you fire hundreds of arrows, but only a small percentage hit exactly what you’re aiming at, especially when you’re just learning. The ones that hit a few inches to the left or right, or even the ones that hit in an outer ring on the target aren’t considered failures, just less than perfect hits, and something to learn from to do better on the next shot. (I only consider it failure if I spend a bunch of time hunting arrows in the grass behind the target instead of shooting.)
I’ve been helping the Hortonworks professional services team for the last several months, doing documentation and unit testing on a huge project they’re smack in the middle of implementing. I mentioned my thoughts on this to one of the guys who has been implementing Hadoop projects for a living for years now, Ryan Merriman. He said that he’d never seen a project truly fail. I realized as he explained what he meant that his definition of failure was a lot like mine for archery.
He said that even if the crazy, unreasonable, or non-existent expectations for a project weren’t met, the business always got some benefit. If they couldn’t find skilled people, then their own internal folks would learn as they went and gain proficiency over time. If they didn’t have a single clear goal, they would experiment until they found something useful to the company, or scale back and focus on a single core benefit if they were trying to do too much. If the project lost support, eventually a new champion would see the untapped potential and step up. In the end, he’d never seen or heard of a company that didn’t get value out of their Hadoop implementation, and weren’t still using it years later.
I asked the other experienced folks on the project and got variations on the same answer. I am aware that I had a very small and thoroughly biased sample for my little informal survey, but it does make me wonder, how many of those “failed” projects are still being put to good use in those businesses that declared them a bust? How many big data projects fail to succeed by the original criteria set, but succeed in providing far more value to the business than their cost?
It’s not a question I have an answer for, but it might be one worth thinking about if you work at a business that “failed” at implementing a big data analytics project.
It might also be interesting to put a new question or two in the next survey that goes out to any companies who declared their project a failure a year or two ago: Are you still using the infrastructure or applications developed as part of that project? If so, why? How much value is your business getting from that? Do you still consider it a failure?
I've spent 20 years in the data management industry in a wide variety of roles – programmer, analyst, trainer, technician, content creator, consultant, and evangelist. Now, I'm the product manager for Syncsort's big data software. I keep a close eye on market and technology trends in big data integration, and try to use what I learn as a crystal ball to predict where our software needs to ...