By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    predictive analytics in dropshipping
    Predictive Analytics Helps New Dropshipping Businesses Thrive
    12 Min Read
    data-driven approach in healthcare
    The Importance of Data-Driven Approaches to Improving Healthcare in Rural Areas
    6 Min Read
    analytics for tax compliance
    Analytics Changes the Calculus of Business Tax Compliance
    8 Min Read
    big data analytics in gaming
    The Role of Big Data Analytics in Gaming
    10 Min Read
    analyst,women,looking,at,kpi,data,on,computer,screen
    Promising Benefits of Predictive Analytics in Asset Management
    11 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: The Data Lake Debate: Pro Cross-Examines Con
Share
Notification Show More
Latest News
ai digital marketing tools
Top Five AI-Driven Digital Marketing Tools in 2023
Artificial Intelligence
ai-generated content
Is AI-Generated Content a Net Positive for Businesses?
Artificial Intelligence
predictive analytics in dropshipping
Predictive Analytics Helps New Dropshipping Businesses Thrive
Predictive Analytics
cloud data security in 2023
Top Tools for Your Cloud Data Security Stack in 2023
Cloud Computing
become a data scientist
Boosting Your Chances for Landing a Job as a Data Scientist
Jobs
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Software > Hadoop > The Data Lake Debate: Pro Cross-Examines Con
Data ManagementHadoopKnowledge ManagementOpen SourceUnstructured Data

The Data Lake Debate: Pro Cross-Examines Con

TamaraDull
Last updated: 2015/04/06 at 5:18 AM
TamaraDull
7 Min Read
Image
SHARE

Image

As to be expected, Anne, your arguments against building a data lake are both persuasive and passionate. You’ve made some great points, my friend, but you’re making this way too easy for me. Before I jump into my rebuttal [my next post], I’d like to clarify a few things that you brought up. I’ve boiled it down to three questions. What say you?

Image

As to be expected, Anne, your arguments against building a data lake are both persuasive and passionate. You’ve made some great points, my friend, but you’re making this way too easy for me. Before I jump into my rebuttal [my next post], I’d like to clarify a few things that you brought up. I’ve boiled it down to three questions. What say you?

More Read

Image

The Data Lake Debate: Conclusion (With Apologies to the Rolling Stones)

The Data Lake Debate: Pro Delivers Final Rebuttal and Summary
The Data Lake Debate: The Final Word from Negative
The Data Lake Debate: Pro Delivers First Rebuttal
The Data Lake Debate: Negative Puts a Stake in the Ground

Image1. In your arguments, you focus on data volumes and the ancillary costs of open source software (OSS) to support these large volumes. Yet, more recent studies show that organizations aren’t as concerned about their data volumes—not everyone is a Google or Facebook—as much as they’re concerned about the variety of data and the ability to integrate it all. How do you address these concerns?

ImageI cannot stress enough that data brought into a data lake is co-located not integrated. Even with schema on read, the integration happens outside of the storage environment – on the banks of this beautiful data lake. Every query that requires a new structure or schema for the data will need to be written from scratch. The cost to value ratio for the time and talent required for this extensive coding (for a still novel technology) for most organizations is limited if not nonexistent. The required skills and abilities to access and integrate data from Hadoop make available talent scarce. You are right, not everyone is a Google or a Facebook. Organizations do not have these skills on staff nor do they have the budget to bring them on.

Hadoop does provide a fantastic data storage opportunity, but it does not require us to abandon all of our existing structured data environments. Copying existing structured data to a data lake (especially transactional data) would be a duplication of effort and storage and would create additional risk for the organization. Moving operational data would be an enormous event, as it would require applications throughout the organization to undergo a significant coding/design overhaul which is not going to be a popular idea in any business unit.

The ideal scenario is to leave existing data where it lives today and use Hadoop as the storage repository for the data that previously could not be stored because of constraints presented by volume, variety or velocity. Organizations can take advantage of data virtualization tools where not only is the integration coding challenge eliminated but other advantages such as centralized security and governance are gained. The data is queried, transformed and structured as needed and provisioned to business users through virtual views. No dumping of data – just purposeful access, integration and use.

Image2. Related to the first question, you state: “Before organizations start down the path of discovering capabilities within a data lake, they should first turn to taking full advantage of their current data.” What if most of their current data is semi-structured or unstructured data (often cited as much as 80-90%)? How do they take full advantage of that data?

ImageWho’s the one making this easy? Careful throwing those stones Ms. Dull. Your glass house is exquisite.

Historically, in business, unstructured data sources were managed within the scope of knowledge management or content management. The vast storage capabilities that Hadoop presents allows the documents, emails and other unstructured sources to be centrally stored and the content is now considered accessible data. While it is true, the sources can now be accessed through Hadoop to glean the content as ingestible data, it is not the storage and access that brings the advantage. The advantage is in the insights derived from the analysis of the data. Regardless of the type of data (structured, semi-structured or unstructured) or how and where the data is stored, organizations can take full advantage of any and all data by generating value when processing or analyzing it within a specific business context. 

Image3. You seem to suggest a top-down data management approach to big data; for example, “…the real success factor is found in strong data management capabilities under the umbrella of a mature data governance program.” Are you implying a top-down approach to big data? When does a bottom-up approach make sense?

ImageThere is a time and a place for both data science and data governance. They do not need to be mutually exclusive. The rigor of data governance is not to create obstacles but to create an environment to foster data management autonomy at the lowest level within the framework of the enterprise data governance program. When it comes to data discovery, governance still has value to protect the organization from compliance and security risks not because of the data itself but how the data is used. I emphatically support innovation labs and data science programs – they are ideal examples of bottom up approaches. However, just because they play in the sandbox, doesn’t mean they don’t follow playground rules.

ImageThanks, Anne! I’ll get started on my first rebuttal to what you’ve presented. Stay tuned!

 

 


Previously in the Data Lake Debate:

  • The Introduction – by Jill Dyche
  • Pro’s Up First – by Tamara Dull
  • Questioning the Pro – by Anne Buff and Tamara Dull
  • Negative Puts a Stake in the Ground – by Anne Buff

TAGGED: Data Lake Debate
TamaraDull April 6, 2015
Share this Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

ai digital marketing tools
Top Five AI-Driven Digital Marketing Tools in 2023
Artificial Intelligence
ai-generated content
Is AI-Generated Content a Net Positive for Businesses?
Artificial Intelligence
predictive analytics in dropshipping
Predictive Analytics Helps New Dropshipping Businesses Thrive
Predictive Analytics
cloud data security in 2023
Top Tools for Your Cloud Data Security Stack in 2023
Cloud Computing

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

[mc4wp_form id=”1616″]

You Might also Like

Image
Big DataData ManagementHadoopPolicy and Governance

The Data Lake Debate: Conclusion (With Apologies to the Rolling Stones)

4 Min Read
Image
Best PracticesBig DataHadoopUnstructured Data

The Data Lake Debate: Pro Delivers Final Rebuttal and Summary

5 Min Read
Data Lake Debate
Big DataData ManagementHadoopOpen SourcePolicy and Governance

The Data Lake Debate: The Final Word from Negative

8 Min Read
Image
Big DataData ManagementHadoopOpen SourcePolicy and Governance

The Data Lake Debate: Pro Delivers First Rebuttal

5 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots
AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?