Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
    pexels pavel danilyuk 8112119
    Data Analytics Is Revolutionizing Medical Credentialing
    8 Min Read
    data and seo
    Maximize SEO Success with Powerful Data Analytics Insights
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: The Data Lake Debate: The Final Word from Negative
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Software > Hadoop > The Data Lake Debate: The Final Word from Negative
Big DataData ManagementHadoopOpen SourcePolicy and Governance

The Data Lake Debate: The Final Word from Negative

cnuwvu
cnuwvu
8 Min Read
Data Lake Debate
SHARE

Data Lake DebateAnne Buff


Data Lake DebateAnne Buff


More Read

The Physical Size of Big Data [INFOGRAPHIC]
Customer Data Protection: What Businesses Can learn from Equifax Data Breach
Big Data is Critical to the DoD Science and Technology Investment Agenda
The R Graph Gallery Goes Social
Customer Feedback and Data Analysis: The Keys to a Good Customer Retention Rate

Well, it seems you took the gloves off this time, Tamara. I appreciate the valiant effort and your passionate belief in the Hadoop ecosystem. However, given your revisit to the definition of the data lake and clarifications about Hadoop, I find it important to repeat the resolution we are debating: “a data lake is essential for any organization to take full advantage of its data”. We are not debating whether a data ecosystem is essential – just the data lake. While I will stand strong with you that a well-designed data ecosystem (open source or proprietary) of many interdependent systems is critically imperative for businesses to succeed in today’s digital world, there are still ample concerns and cautions to consider before declaring the data lake essential. As I reflect on our debate, the following are the key issues keeping the data lake from the prestige and splendor in which you have presented it.

Physical attributes do not determine business value. Regardless of shape, size, or other linguistic expression used to define the qualities of a data lake, the data lake still remains a storage repository. Until the data is processed and consumed, it does not provide business value. Any storage repository on its own does not prove itself essential for the organization; it must be part of a larger, well-designed data infrastructure. The options for data storage architectures are numerous and the implementation choice should be contingent upon business need and technical requirements. A data lake is not the catchall answer.

The talent gap is real. If we were to accept your argument that the Hadoop Ecosystem is what organizations should be considering, the technical skills to support the environment would become even greater than just considering Hadoop, the open source project. As I mentioned before, finding individuals with the skills to access, query and manage just Apache Hadoop is difficult. If you add in the need for skills using Hive, Spark, Ambari, Pig, HBase, etc. and the wide variety of vendor distributions the talent pool is significantly smaller. In the event an organization is able to hire the talent (or grow it in-house) the cost and paranoia of turnover dramatically rises.

The risk is greater than the reward. It does sound idyllic to have any and all of the organization’s data in a central location to serve the needs of the entire enterprise. But, at what cost? As I mentioned before, copying existing structured data to a data lake (especially transactional data) would be a duplication of effort and storage and would create additional risk for the organization. How many copies of the data do we need anyways? The source system, the data mart/store, the data warehouse and now the data lake? Data integration is far more important than data co-habitation. Data governance and security are not inherent to the data lake environment (regardless of form). Without policies, procedures and additional technology to secure and protect this massive collection of data, the organization is at enormous risk. No executive in his or her right mind will jump on board for this. There is a reason Capgemini Consulting found “only 13 percent of organizations have achieved full-scale production for their big data implementations” and “only 27 percent of the executives surveyed described their big data initiatives as successful.” The data lake is no exception.

Collection without purpose is hoarding. Like you said, not everyone is a Google or a Facebook. Well, not everyone is Amazon either. Storing everything is just not an option for most organizations. So the question becomes, “What should be stored?” Answering this question without consideration of strategic business initiatives or goals is futile.

The organizations with which I have worked that have implemented a data lake or a data-lake-like environment for technical initiatives have all had the same concern – “Now that it is built, we need to convince the business to use it.” To establish value and ensure use, the business needs to be involved in the data lake development from the onset. Business stakeholders care about what is stored – not how it is stored. Value will not magically appear without purpose.

All of that being said, there is one scenario where “without purpose” becomes the purpose (I mentioned this before as well.) In the world of analytics and data science, the data lake becomes a gold mine. The volume and variety of big data combined with the accuracy and structure of operational data provides a rich and fruitful environment for data wizards to develop and refine models that generate insights we never thought possible. Even in this situation though, I would argue that while the data lake is definitely valuable, the essential component is the brilliant analytical minds.

The alternative is…

You asked, “If a Hadoop-based data lake is not the answer, then what is?” Organizations should absolutely begin to consider new ways of collecting, packaging and delivering data both internally and externally. Ultimately, it doesn’t matter how or where the data is stored but instead how it is integrated and accessed for purpose. An organization’s data infrastructure and strategy will be an evolution based on business needs and initiatives, budgets, technical skills and available technologies. In time, a data lake may in fact be a valuable asset in the essential and indispensable well-designed, purpose-built data ecosystem. But, by then maybe it will be a data river (ever flowing), or a data mountain (peaks and valleys), or whatever trendy industry term comes to be. Any which way, it will only a be a part, never the essential component.

And for the record…

Not all data lakes are Hadoop-based.


 

Previously in the Data Lake Debate:

 

  • The Introduction – by Jill Dyche
  • Pro’s Up First – by Tamara Dull
  • Questioning the Pro – by Anne Buff and Tamara Dull
  • Negative Puts a Stake in the Ground – by Anne Buff
  • Pro Cross-Examines Con – by Tamara Dull and Anne Buff
  • Pro Delivers First Rebuttal – by Tamara Dull


TAGGED:Data Lake Debate
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

image fx (2)
Monitoring Data Without Turning into Big Brother
Big Data Exclusive
image fx (71)
The Power of AI for Personalization in Email
Artificial Intelligence Exclusive Marketing
image fx (67)
Improving LinkedIn Ad Strategies with Data Analytics
Analytics Big Data Exclusive Software
big data and remote work
Data Helps Speech-Language Pathologists Deliver Better Results
Analytics Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Image
Best PracticesBig DataHadoopUnstructured Data

The Data Lake Debate: Pro Delivers Final Rebuttal and Summary

5 Min Read
Image
Data ManagementHadoopKnowledge ManagementOpen SourceUnstructured Data

The Data Lake Debate: Pro Cross-Examines Con

7 Min Read
Image
Best PracticesBig DataData ManagementData WarehousingHadoop

The Data Lake Debate: The Introduction

3 Min Read
Data Lake Debate
Big DataData ManagementHadoopPolicy and Governance

The Data Lake Debate: Questioning the Pro

8 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive
data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?