The Data Lake Debate: Conclusion (With Apologies to the Rolling Stones)




ImageWhen I wrote the introduction to the now-celebrated Data Lake Debate, I couldn’t have anticipated the buzz the exchange would get! Neither Anne Buff nor Tamara Dull pulled any punches in defending her perspective on the value and the perils of the data lake.

Like you, I agree with both of them. From Tamara’s perspective, the data lake presents a fresh and practical solution for easier data access, loading, cleansing, provisioning, and archiving, freeing companies from the yoke of traditional relational database systems and their accompanying processing and labor-intensive infrastructures.

But Anne also has a point, which is that the data lake—significant development that it is—is still only a component in an overall data ecosystem that includes data management and governance, quality and master data management solutions, and loading and provisioning standards. And, Anne insists, it need not include Hadoop.

As Anne and Tamara would both agree, you can’t simply hold your nose and jump headfirst into the data lake without risking injury. Business and IT managers should be clear-headed about incumbent capabilities, skills, and tools for data management. They should understand existing skills, and skill gaps. And they should be able to articulate the problems that a data lake can solve, as well as the risks and costs it will incur.

As I said in my introduction, formalizing policies around business data is a more significant factor in information management maturity than any single platform or architecture. Many companies with putatively mature data warehouses have nevertheless slacked off when it comes to data integration standards and data security policies. Adopting a data lake into your existing ecosystem might be a pretext for introducing this rigor. Or it might simply be yet another hammer looking for a nail.

In an homage to the Rolling Stones, I blithely suggested that if you try sometimes, you get what you need—be it more funding, access to third-party data, a more effective executive sponsor, or a Hadoop distribution provider. It’s not an easy decision, but I’d call the data lake debate a draw. After all, when it comes to the verdict on whether or not a data lake is a worthwhile investment, the success stories will start to emerge. In the meantime, I’m happy to watch those stories unfold. Time, after all, is on my side. Yes it is.

(See what I did there?)

In case you missed the Data Lake Debate:

But wait! There’s more! Join Tamara Dull and Anne Buff on May 27, 2015, for a live webcast version of this debate. The gloves will be off. See you there!