By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    data-driven white label SEO
    Does Data Mining Really Help with White Label SEO?
    7 Min Read
    marketing analytics for hardware vendors
    IT Hardware Startups Turn to Data Analytics for Market Research
    9 Min Read
    big data and digital signage
    The Power of Big Data and Analytics in Digital Signage
    5 Min Read
    data analytics investing
    Data Analytics Boosts ROI of Investment Trusts
    9 Min Read
    football data collection and analytics
    Unleashing Victory: How Data Collection Is Revolutionizing Football Performance Analysis!
    4 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: The Data Lake Debate: The Final Word from Negative
Share
Notification Show More
Aa
SmartData CollectiveSmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Software > Hadoop > The Data Lake Debate: The Final Word from Negative
Big DataData ManagementHadoopOpen SourcePolicy and Governance

The Data Lake Debate: The Final Word from Negative

cnuwvu
Last updated: 2015/04/23 at 3:06 AM
cnuwvu
8 Min Read
Data Lake Debate
SHARE

Data Lake DebateAnne Buff


Data Lake DebateAnne Buff


More Read

Image

The Data Lake Debate: Conclusion (With Apologies to the Rolling Stones)

The Data Lake Debate: Pro Delivers Final Rebuttal and Summary
The Data Lake Debate: Pro Delivers First Rebuttal
The Data Lake Debate: Pro Cross-Examines Con
The Data Lake Debate: Negative Puts a Stake in the Ground

Well, it seems you took the gloves off this time, Tamara. I appreciate the valiant effort and your passionate belief in the Hadoop ecosystem. However, given your revisit to the definition of the data lake and clarifications about Hadoop, I find it important to repeat the resolution we are debating: “a data lake is essential for any organization to take full advantage of its data”. We are not debating whether a data ecosystem is essential – just the data lake. While I will stand strong with you that a well-designed data ecosystem (open source or proprietary) of many interdependent systems is critically imperative for businesses to succeed in today’s digital world, there are still ample concerns and cautions to consider before declaring the data lake essential. As I reflect on our debate, the following are the key issues keeping the data lake from the prestige and splendor in which you have presented it.

Physical attributes do not determine business value. Regardless of shape, size, or other linguistic expression used to define the qualities of a data lake, the data lake still remains a storage repository. Until the data is processed and consumed, it does not provide business value. Any storage repository on its own does not prove itself essential for the organization; it must be part of a larger, well-designed data infrastructure. The options for data storage architectures are numerous and the implementation choice should be contingent upon business need and technical requirements. A data lake is not the catchall answer.

The talent gap is real. If we were to accept your argument that the Hadoop Ecosystem is what organizations should be considering, the technical skills to support the environment would become even greater than just considering Hadoop, the open source project. As I mentioned before, finding individuals with the skills to access, query and manage just Apache Hadoop is difficult. If you add in the need for skills using Hive, Spark, Ambari, Pig, HBase, etc. and the wide variety of vendor distributions the talent pool is significantly smaller. In the event an organization is able to hire the talent (or grow it in-house) the cost and paranoia of turnover dramatically rises.

The risk is greater than the reward. It does sound idyllic to have any and all of the organization’s data in a central location to serve the needs of the entire enterprise. But, at what cost? As I mentioned before, copying existing structured data to a data lake (especially transactional data) would be a duplication of effort and storage and would create additional risk for the organization. How many copies of the data do we need anyways? The source system, the data mart/store, the data warehouse and now the data lake? Data integration is far more important than data co-habitation. Data governance and security are not inherent to the data lake environment (regardless of form). Without policies, procedures and additional technology to secure and protect this massive collection of data, the organization is at enormous risk. No executive in his or her right mind will jump on board for this. There is a reason Capgemini Consulting found “only 13 percent of organizations have achieved full-scale production for their big data implementations” and “only 27 percent of the executives surveyed described their big data initiatives as successful.” The data lake is no exception.

Collection without purpose is hoarding. Like you said, not everyone is a Google or a Facebook. Well, not everyone is Amazon either. Storing everything is just not an option for most organizations. So the question becomes, “What should be stored?” Answering this question without consideration of strategic business initiatives or goals is futile.

The organizations with which I have worked that have implemented a data lake or a data-lake-like environment for technical initiatives have all had the same concern – “Now that it is built, we need to convince the business to use it.” To establish value and ensure use, the business needs to be involved in the data lake development from the onset. Business stakeholders care about what is stored – not how it is stored. Value will not magically appear without purpose.

All of that being said, there is one scenario where “without purpose” becomes the purpose (I mentioned this before as well.) In the world of analytics and data science, the data lake becomes a gold mine. The volume and variety of big data combined with the accuracy and structure of operational data provides a rich and fruitful environment for data wizards to develop and refine models that generate insights we never thought possible. Even in this situation though, I would argue that while the data lake is definitely valuable, the essential component is the brilliant analytical minds.

The alternative is…

You asked, “If a Hadoop-based data lake is not the answer, then what is?” Organizations should absolutely begin to consider new ways of collecting, packaging and delivering data both internally and externally. Ultimately, it doesn’t matter how or where the data is stored but instead how it is integrated and accessed for purpose. An organization’s data infrastructure and strategy will be an evolution based on business needs and initiatives, budgets, technical skills and available technologies. In time, a data lake may in fact be a valuable asset in the essential and indispensable well-designed, purpose-built data ecosystem. But, by then maybe it will be a data river (ever flowing), or a data mountain (peaks and valleys), or whatever trendy industry term comes to be. Any which way, it will only a be a part, never the essential component.

And for the record…

Not all data lakes are Hadoop-based.


 

Previously in the Data Lake Debate:

 

  • The Introduction – by Jill Dyche
  • Pro’s Up First – by Tamara Dull
  • Questioning the Pro – by Anne Buff and Tamara Dull
  • Negative Puts a Stake in the Ground – by Anne Buff
  • Pro Cross-Examines Con – by Tamara Dull and Anne Buff
  • Pro Delivers First Rebuttal – by Tamara Dull


TAGGED: Data Lake Debate
cnuwvu April 23, 2015
Share This Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

sobm for ai-driven cybersecurity
Software Bill of Materials is Crucial for AI-Driven Cybersecurity
Security
IT budgeting for data-driven companies
IT Budgeting Practices for Data-Driven Companies
IT
machine,translation
Translating Artificial Intelligence: Learning to Speak Global Languages
Artificial Intelligence
data science upskilling
Upskilling for Emerging Industries Affected by Data Science
Big Data

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

Image
Big DataData ManagementHadoopPolicy and Governance

The Data Lake Debate: Conclusion (With Apologies to the Rolling Stones)

4 Min Read
Image
Best PracticesBig DataHadoopUnstructured Data

The Data Lake Debate: Pro Delivers Final Rebuttal and Summary

5 Min Read
Image
Big DataData ManagementHadoopOpen SourcePolicy and Governance

The Data Lake Debate: Pro Delivers First Rebuttal

5 Min Read
Image
Data ManagementHadoopKnowledge ManagementOpen SourceUnstructured Data

The Data Lake Debate: Pro Cross-Examines Con

7 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence
ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?