The Data Lake Debate: Pro Delivers Final Rebuttal and Summary

April 28, 2015
59 Views

Image

Image

ImageOkay, this is where the rubber meets the road. I have three minutes (or ~450 words) to respond to Anne’s final statement and summarize why I still believe a data lake is essential for any organization to take full advantage of its data. Let’s get started!

Timer: START!

Where We Stand

I put together this simple data graphic to help summarize the core arguments brought up during this debate. It focuses on data variety and purpose:

Image

And here’s our positions for each quadrant:

Image

My Final Rebuttal

Whereby Anne is focused on data in quadrants 1 and 2, my focus is on all four quadrants – and a centralized storage repository, like a data lake, is the first step in bringing all this data together in its raw, native format – without the limitations and biases of existing, relational systems.

Where data is stored is important. None of the data in these four quadrants is new. We’ve had access to all this digital data for several decades—in databases, data warehouses, file systems, applications, etc. What is new, however, is that we now have the technologies—the most popular right now being Hadoop—to bring the data from any quadrant all together, process it any way we want, and then store the processed results anywhere we want. And if we don’t like the results, or we have new data, or we have different questions, it’s no big deal to go back to the original, raw data and start over. You cannot do this in Anne’s world.

Different skills? That’s good! Anne also talks about the skills required for the data lake. Yes, these big data technologies are new, they’re evolving, and there’s a lot of experimentation going on to figure out what’s needed, what’s not, what should stick, what shouldn’t, etc. Thus, it should be no surprise that as our technologies evolve, so will the skills required. So a lack of skills for these newer technologies should not be seen as a negative. It’s an opportunity to take what we have and know to a new level and help prepare the next generation to excel in our data-saturated society.

My Final Summary

What a data lake is not. A data lake is not a panacea or a geographic cure or another version of the data warehouse…or even a data swamp. If an organization is already bad at governing and managing its existing data, then adding a data lake will only make matters worse. I will be the first to say: Don’t go there.

What a data lake is. It’s a newer storage alternative for organizations that want to mix-and-match their data (from quadrants 1-4 above) so that they can analyze it and discover insights that they would never be able to find with existing, relational technologies.

An organization will be able to take full advantage of its data if there’s a way for them to bring it all together without breaking the bank. The data lake provides that opportunity.

Timer: STOP! Word count: 575 (oops!)

A note to Anne: While the boss is putting together her summation of this debate, want to meet up at the bar for a drink or three? I’m buying.


Previously in the Data Lake Debate:

You may be interested

Is Big Data the Salvation of the Newspaper Industry?
Analytics
0 shares296 views
Analytics
0 shares296 views

Is Big Data the Salvation of the Newspaper Industry?

Rehan Ijaz - May 27, 2017

The newspaper industry has been declining for the past decade. In 2007, Paul Gillan, a former reporter, launched the website…

Big Data is the Key to the Future of Multi-Device Marketing
Big Data
0 shares370 views
Big Data
0 shares370 views

Big Data is the Key to the Future of Multi-Device Marketing

Ryan Kh - May 26, 2017

Digital marketers must reach customers across multiple devices. According to Criteo Mobile eCommerce Report, 40% of all online transactions involve…

Empowering Partners and Customers with Data Insights: A Win-Win for Everyone
Analytics
0 shares345 views
Analytics
0 shares345 views

Empowering Partners and Customers with Data Insights: A Win-Win for Everyone

Guy Greenberg - May 26, 2017

All businesses in the digital age rely on analytics for various activities: Product managers rely on analytics to gain insights…