By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    predictive analytics in dropshipping
    Predictive Analytics Helps New Dropshipping Businesses Thrive
    12 Min Read
    data-driven approach in healthcare
    The Importance of Data-Driven Approaches to Improving Healthcare in Rural Areas
    6 Min Read
    analytics for tax compliance
    Analytics Changes the Calculus of Business Tax Compliance
    8 Min Read
    big data analytics in gaming
    The Role of Big Data Analytics in Gaming
    10 Min Read
    analyst,women,looking,at,kpi,data,on,computer,screen
    Promising Benefits of Predictive Analytics in Asset Management
    11 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: The Data Lake Debate: The Final Word from Negative
Share
Notification Show More
Latest News
ai digital marketing tools
Top Five AI-Driven Digital Marketing Tools in 2023
Artificial Intelligence
ai-generated content
Is AI-Generated Content a Net Positive for Businesses?
Artificial Intelligence
predictive analytics in dropshipping
Predictive Analytics Helps New Dropshipping Businesses Thrive
Predictive Analytics
cloud data security in 2023
Top Tools for Your Cloud Data Security Stack in 2023
Cloud Computing
become a data scientist
Boosting Your Chances for Landing a Job as a Data Scientist
Jobs
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Software > Hadoop > The Data Lake Debate: The Final Word from Negative
Big DataData ManagementHadoopOpen SourcePolicy and Governance

The Data Lake Debate: The Final Word from Negative

cnuwvu
Last updated: 2015/04/23 at 3:06 AM
cnuwvu
8 Min Read
Data Lake Debate
SHARE

Data Lake DebateAnne Buff


Data Lake DebateAnne Buff


More Read

Image

The Data Lake Debate: Conclusion (With Apologies to the Rolling Stones)

The Data Lake Debate: Pro Delivers Final Rebuttal and Summary
The Data Lake Debate: Pro Delivers First Rebuttal
The Data Lake Debate: Pro Cross-Examines Con
The Data Lake Debate: Negative Puts a Stake in the Ground

Well, it seems you took the gloves off this time, Tamara. I appreciate the valiant effort and your passionate belief in the Hadoop ecosystem. However, given your revisit to the definition of the data lake and clarifications about Hadoop, I find it important to repeat the resolution we are debating: “a data lake is essential for any organization to take full advantage of its data”. We are not debating whether a data ecosystem is essential – just the data lake. While I will stand strong with you that a well-designed data ecosystem (open source or proprietary) of many interdependent systems is critically imperative for businesses to succeed in today’s digital world, there are still ample concerns and cautions to consider before declaring the data lake essential. As I reflect on our debate, the following are the key issues keeping the data lake from the prestige and splendor in which you have presented it.

Physical attributes do not determine business value. Regardless of shape, size, or other linguistic expression used to define the qualities of a data lake, the data lake still remains a storage repository. Until the data is processed and consumed, it does not provide business value. Any storage repository on its own does not prove itself essential for the organization; it must be part of a larger, well-designed data infrastructure. The options for data storage architectures are numerous and the implementation choice should be contingent upon business need and technical requirements. A data lake is not the catchall answer.

The talent gap is real. If we were to accept your argument that the Hadoop Ecosystem is what organizations should be considering, the technical skills to support the environment would become even greater than just considering Hadoop, the open source project. As I mentioned before, finding individuals with the skills to access, query and manage just Apache Hadoop is difficult. If you add in the need for skills using Hive, Spark, Ambari, Pig, HBase, etc. and the wide variety of vendor distributions the talent pool is significantly smaller. In the event an organization is able to hire the talent (or grow it in-house) the cost and paranoia of turnover dramatically rises.

The risk is greater than the reward. It does sound idyllic to have any and all of the organization’s data in a central location to serve the needs of the entire enterprise. But, at what cost? As I mentioned before, copying existing structured data to a data lake (especially transactional data) would be a duplication of effort and storage and would create additional risk for the organization. How many copies of the data do we need anyways? The source system, the data mart/store, the data warehouse and now the data lake? Data integration is far more important than data co-habitation. Data governance and security are not inherent to the data lake environment (regardless of form). Without policies, procedures and additional technology to secure and protect this massive collection of data, the organization is at enormous risk. No executive in his or her right mind will jump on board for this. There is a reason Capgemini Consulting found “only 13 percent of organizations have achieved full-scale production for their big data implementations” and “only 27 percent of the executives surveyed described their big data initiatives as successful.” The data lake is no exception.

Collection without purpose is hoarding. Like you said, not everyone is a Google or a Facebook. Well, not everyone is Amazon either. Storing everything is just not an option for most organizations. So the question becomes, “What should be stored?” Answering this question without consideration of strategic business initiatives or goals is futile.

The organizations with which I have worked that have implemented a data lake or a data-lake-like environment for technical initiatives have all had the same concern – “Now that it is built, we need to convince the business to use it.” To establish value and ensure use, the business needs to be involved in the data lake development from the onset. Business stakeholders care about what is stored – not how it is stored. Value will not magically appear without purpose.

All of that being said, there is one scenario where “without purpose” becomes the purpose (I mentioned this before as well.) In the world of analytics and data science, the data lake becomes a gold mine. The volume and variety of big data combined with the accuracy and structure of operational data provides a rich and fruitful environment for data wizards to develop and refine models that generate insights we never thought possible. Even in this situation though, I would argue that while the data lake is definitely valuable, the essential component is the brilliant analytical minds.

The alternative is…

You asked, “If a Hadoop-based data lake is not the answer, then what is?” Organizations should absolutely begin to consider new ways of collecting, packaging and delivering data both internally and externally. Ultimately, it doesn’t matter how or where the data is stored but instead how it is integrated and accessed for purpose. An organization’s data infrastructure and strategy will be an evolution based on business needs and initiatives, budgets, technical skills and available technologies. In time, a data lake may in fact be a valuable asset in the essential and indispensable well-designed, purpose-built data ecosystem. But, by then maybe it will be a data river (ever flowing), or a data mountain (peaks and valleys), or whatever trendy industry term comes to be. Any which way, it will only a be a part, never the essential component.

And for the record…

Not all data lakes are Hadoop-based.


 

Previously in the Data Lake Debate:

 

  • The Introduction – by Jill Dyche
  • Pro’s Up First – by Tamara Dull
  • Questioning the Pro – by Anne Buff and Tamara Dull
  • Negative Puts a Stake in the Ground – by Anne Buff
  • Pro Cross-Examines Con – by Tamara Dull and Anne Buff
  • Pro Delivers First Rebuttal – by Tamara Dull


TAGGED: Data Lake Debate
cnuwvu April 23, 2015
Share this Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

ai digital marketing tools
Top Five AI-Driven Digital Marketing Tools in 2023
Artificial Intelligence
ai-generated content
Is AI-Generated Content a Net Positive for Businesses?
Artificial Intelligence
predictive analytics in dropshipping
Predictive Analytics Helps New Dropshipping Businesses Thrive
Predictive Analytics
cloud data security in 2023
Top Tools for Your Cloud Data Security Stack in 2023
Cloud Computing

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

[mc4wp_form id=”1616″]

You Might also Like

Image
Big DataData ManagementHadoopPolicy and Governance

The Data Lake Debate: Conclusion (With Apologies to the Rolling Stones)

4 Min Read
Image
Best PracticesBig DataHadoopUnstructured Data

The Data Lake Debate: Pro Delivers Final Rebuttal and Summary

5 Min Read
Image
Big DataData ManagementHadoopOpen SourcePolicy and Governance

The Data Lake Debate: Pro Delivers First Rebuttal

5 Min Read
Image
Data ManagementHadoopKnowledge ManagementOpen SourceUnstructured Data

The Data Lake Debate: Pro Cross-Examines Con

7 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence
AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?