By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    predictive analytics in dropshipping
    Predictive Analytics Helps New Dropshipping Businesses Thrive
    12 Min Read
    data-driven approach in healthcare
    The Importance of Data-Driven Approaches to Improving Healthcare in Rural Areas
    6 Min Read
    analytics for tax compliance
    Analytics Changes the Calculus of Business Tax Compliance
    8 Min Read
    big data analytics in gaming
    The Role of Big Data Analytics in Gaming
    10 Min Read
    analyst,women,looking,at,kpi,data,on,computer,screen
    Promising Benefits of Predictive Analytics in Asset Management
    11 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: The Data Lake Debate: Pro Delivers First Rebuttal
Share
Notification Show More
Latest News
ai digital marketing tools
Top Five AI-Driven Digital Marketing Tools in 2023
Artificial Intelligence
ai-generated content
Is AI-Generated Content a Net Positive for Businesses?
Artificial Intelligence
predictive analytics in dropshipping
Predictive Analytics Helps New Dropshipping Businesses Thrive
Predictive Analytics
cloud data security in 2023
Top Tools for Your Cloud Data Security Stack in 2023
Cloud Computing
become a data scientist
Boosting Your Chances for Landing a Job as a Data Scientist
Jobs
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Software > Hadoop > The Data Lake Debate: Pro Delivers First Rebuttal
Big DataData ManagementHadoopOpen SourcePolicy and Governance

The Data Lake Debate: Pro Delivers First Rebuttal

TamaraDull
Last updated: 2015/04/11 at 2:39 AM
TamaraDull
5 Min Read
Image
SHARE

Image

Contents
Revisiting Definitions (Again!)And the Alternative is…Without Purpose is Okay

Image

ImageIn keeping with the spirit of this Lincoln-Douglas debate format, it looks like I only have 4 minutes (or approximately 600 words) to rebut the anti-data lake arguments Anne presented in this post and this one. Let’s do it!

Timer: START!

More Read

Image

The Data Lake Debate: Conclusion (With Apologies to the Rolling Stones)

The Data Lake Debate: Pro Delivers Final Rebuttal and Summary
The Data Lake Debate: The Final Word from Negative
The Data Lake Debate: Pro Cross-Examines Con
The Data Lake Debate: Negative Puts a Stake in the Ground

One of the challenges in this debate – at least for me – is that Anne and I seem to be operating on different definitions of two key terms in this discussion: data lake and Hadoop. The reason I bring this up is because you see this same confusion, or lack of clarity, elsewhere. So that’s where I’d like to start.

Revisiting Definitions (Again!)

About the data lake. In my opening argument, I defined the data lake as a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured, and unstructured data. I also mentioned that a data lake can take on different shapes and sizes, and provided these examples:

  • A single data lake; or
  • A data lake with multiple data ponds—similar in concept to a data warehouse/data mart model; or
  • Multiple, decentralized data lakes; or
  • A virtual data lake to reduce data movement.

Whereby I’ve been operating under a more logical-based definition of a data lake during this debate, Anne’s been more focused on a single, physical storage repository in her arguments.

About Hadoop. Hadoop has two primary meanings: it’s both an open source project and an ecosystem of related projects and technologies. Here’s how they differ:

  • Open source project. When Hadoop made its commercial debut, much of the discussion was around Apache Hadoop, an open source project released by the Apache Software Foundation. Apache Hadoop was built to do two things: store and process any and all kinds of data.
  • Ecosystem. Today, when you hear discussions of Hadoop, it’s more likely about the ecosystem of projects – both open source and proprietary – that work with Apache Hadoop to make it a more robust data-everything platform. Apache Hadoop was never intended to do it all. The Hadoop ecosystem, however, is hell-bent on doing it all – and then some.

During this debate, when I’ve mentioned Hadoop, I’ve been referring to the Hadoop ecosystem. From what I can tell from Anne’s arguments, she’s been talking about Apache Hadoop. Again, same word, different uses.

And the Alternative is…

Throughout Anne’s argument, she points out the shortcomings of using Apache Hadoop (not the ecosystem) as a data lake. Point taken. But when I asked what organizations are supposed to do when the majority of their data (80-90%) is not sitting in pristine data structures, Anne replied, “It is not the storage and access [of Apache Hadoop] that brings the advantage. The advantage is in the insights derived from the analysis of the data.” What’s still not clear is how and where this analysis is taking place. If a Hadoop-based data lake is not the answer, then what is? 

Without Purpose is Okay

You can see Anne squirming – just like fingernails on a chalkboard – anytime someone mentions collecting and storing data without a purpose or business context. She retaliates with “There’s no value to the organization!” Au contraire, mon ami! Tell Amazon that. They haven’t thrown any data away since day 1. Do you think they knew they’d be getting a patent for anticipatory shipping – i.e., shipping your package before you buy it – when they first started out over 20 years ago?

Today, we have big data technologies, like the Hadoop ecosystem, that allow organizations to collect and store any and all data at a fraction of the cost. I fully agree with Anne that “just because you can doesn’t mean you should” –but I would also contend that just because you can’t define the purpose now doesn’t mean you shouldn’t collect and store it. Don’t be afraid to embrace the unknown unknowns in your data.

Timer: STOP! Total word count: 598


Previously in the Data Lake Debate:

  • The Introduction – by Jill Dyche
  • Pro’s Up First – by Tamara Dull
  • Questioning the Pro – by Anne Buff and Tamara Dull
  • Negative Puts a Stake in the Ground – by Anne Buff
  • Pro Cross-Examines Con – by Tamara Dull and Anne Buff


TAGGED: Data Lake Debate
TamaraDull April 11, 2015
Share this Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

ai digital marketing tools
Top Five AI-Driven Digital Marketing Tools in 2023
Artificial Intelligence
ai-generated content
Is AI-Generated Content a Net Positive for Businesses?
Artificial Intelligence
predictive analytics in dropshipping
Predictive Analytics Helps New Dropshipping Businesses Thrive
Predictive Analytics
cloud data security in 2023
Top Tools for Your Cloud Data Security Stack in 2023
Cloud Computing

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

[mc4wp_form id=”1616″]

You Might also Like

Image
Big DataData ManagementHadoopPolicy and Governance

The Data Lake Debate: Conclusion (With Apologies to the Rolling Stones)

4 Min Read
Image
Best PracticesBig DataHadoopUnstructured Data

The Data Lake Debate: Pro Delivers Final Rebuttal and Summary

5 Min Read
Data Lake Debate
Big DataData ManagementHadoopOpen SourcePolicy and Governance

The Data Lake Debate: The Final Word from Negative

8 Min Read
Image
Data ManagementHadoopKnowledge ManagementOpen SourceUnstructured Data

The Data Lake Debate: Pro Cross-Examines Con

7 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data
ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?