A Day Late and Big Data Architecture Short

As humans, we have no problems staying busy. In fact, most of us are wired to “do something first” and ask questions later. In other words, our busyness often comes at the expense of a well thought out plan to tackle projects from the mundane to complex. In the world of big data, this can be a dreadful strategy, especially in light of investments that generally start in the hundreds of thousands of dollars and can easily eclipse $2-3 million for a decent deployment.

In Lewis Carroll’s, Alice’s Adventures in Wonderland, the Gryphon tells Alice; “No, no! The adventures first, explanations take such a dreadful time.” Isn’t this human nature, to want to start something and worry about how things came to be/or should be after the fact?

Last week I was in San Francisco at Spark Summit. As customary, I talked to customers about their big data plans. One particular person (who shall remain nameless) regaled me with stories of his Spark implementation (now in production) and how he had plans for a broader roll-out. When I asked him how Spark fit in his broader ecosystem, he sheepishly smiled that he really didn’t know. In fact, as I quizzed the gentleman a bit more, he admitted that his organization had not so much as a high level architecture diagram worked up of his big data ecosystem. It appeared to us both, that he was making up his big data plans as he went.

In another conversation this week, I talked to a fellow who designs airports, towering rises and large habitations for the global elite. He mentioned how stunned he was that clients would fuss about details such as the color of carpets first, over broader discussions on the design/architecture and functionality of the overall multi-billion dollar development.

Indeed, time and again, in customer conversations around big data, there are rarely questions whether Hadoop and its ecosystem is the right investment. The challenge that keeps coming up revolves around “where to begin”, and more importantly “why”.

To answer the “why” in big data, you need to understand your use cases. And you need to understand your business case/s. A compelling big data strategy should—at the very least—explain how Hadoop will be used to drive measurable business value, have a prioritized and sequenced roadmap signed off by business and technology stakeholders, and then include an architecture definition (preferably beyond the back of a napkin) that supports use cases to drive the business forward.

However, in the mad-dash to “do something” in big data, IT and business managers alike seem to continuously jump on the latest technology (today it’s Spark) and thus become inevitable case studies for Gartner’s well documented hype cycle.

Please don’t mistake the intent of this column. I have nothing against Apache Spark, and I believe it is a wonderful technology for now and the future. As are all the engines and YARN applications in the Hadoop ecosystem. But a technology without a plan for user acceptance, adoption and business value, is well—just a technology.

Want to do big data right? You will need to come to sound conclusions on your analytic priorities, architecture, technologies, skill sets and support model. Things that make sense for your particular business, not something you picked up at a conference that companies X, Y and Z are doing.

Because short of understanding of how the Hadoop ecosystem is exactly right for you, to borrow another quip from Alice in Wonderland’s Cheshire Cat; “If you don’t know where you’re going, any road will take you there.”