Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
    pexels pavel danilyuk 8112119
    Data Analytics Is Revolutionizing Medical Credentialing
    8 Min Read
    data and seo
    Maximize SEO Success with Powerful Data Analytics Insights
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: The Data Lake Debate: Questioning the Pro
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Software > Hadoop > The Data Lake Debate: Questioning the Pro
Big DataData ManagementHadoopPolicy and Governance

The Data Lake Debate: Questioning the Pro

cnuwvu
cnuwvu
8 Min Read
Data Lake Debate
SHARE

Data Lake Debate

Data Lake Debate

Tamara, Tamara, Tamara…We have known each other for quite a while and I cannot believe we are having the same conversation AGAIN! Technology is not the answer for every data issue. I get it – Hadoop and the concept of data lakes are hot topics. However, just because they are trending in the world of technology does not mean that they will solve critical business issues such as taking full advantage of an organization’s data. That being said, I have a few questions for you about the definitions and your arguments.


Anne Buff1. You define data as “information produced or stored by a computer that can be digitally transmitted or processed.” Given that data is not information until meaning is derived from processing within a specific business context or purpose, how can a storage repository which stores data (as you define the data lake) be essential to an organization without purpose?

More Read

Data Collection: Get All Your Customers to Sign Up for Your Digital Campaigns
Make More Out of Product Reviews: Take them Offline too!
Why Choosing Python For Data Science Is An Important Move
R .SAS .NYT. One great discussion.
Weny

Tamara DullAnne, Anne, Anne, as always, you look smashing in your rose-colored data management glasses! But this ain’t your grandfather’s rodeo anymore and it’s time to consider new technological pastures.

Okay, so you weren’t fond of my use of the term information in my data definition. That’s fair. It’s confusing and somewhat circular. My point was that the data in data lake is digital in nature. Can we agree on that?

As for your question, do you think I’m suggesting that an organization create a big black box, slap a label of “data lake” on it, and then start filling it up with any and all data—without any context or purpose? As crazy as that sounds (and there are some who are saying this), it is not what I’m suggesting. What I am saying is now that we have the technology to build a proper data lake, it’s time to consider it—not in a “build it and they will come” haphazard fashion, but in a strategic, methodical manner.

Will all the data that comes into the data lake have context and purpose? Absolutely not. Even though that’s the ideal, it’s not realistic. Context and purpose will need to be added as the data is processed and pushed/pulled downstream to other repositories and applications.


Anne Buff2. Just because you can capture and store “any and all data” in a data lake as you state in your first argument, it doesn’t mean you should. Governance is not inherent to big data environments. Data is neutral. What you do with it is not. If collection and discovery are not governed, enormous risk is created for the organization. How do you resolve this?


Tamara DullYes, I totally agree: Just because you can doesn’t mean you should. If we look back over the years, we’ve learned to live with: Just because you want to (store and process any and all data) doesn’t mean you can (due to technology limitations, costs, etc.).

Now that we can—with big data technologies like Hadoop—the question is now shifting to “Should we?” Some are saying, “Sure! Grab it all and throw it in the data lake!” while others are convinced that grabbing it all will only result in a big ol’ smelly data swamp. The correct answer lies somewhere in between these two extremes for an organization.

But make no mistake: The data lake is not a geographical cure. If your organization is already doing a crummy job of not governing and managing the data in your current systems, then moving any data—existing or new—to a data lake is not going to solve this core shortcoming. Your bad data and data practices will follow you.


Anne Buff3. In your game-changing value proposition you contend, “With today’s big data technologies, organizations now have an economically attractive option to bring any and all data into a single, scalable infrastructure model.“ While that sounds ideal, co-located data is not integrated data which is necessary for reporting and analytics. At what point do you consider actually integrating the data?


Tamara DullThe short answer is schema-on-read. What this means is: Instead of structuring the data before it goes into a repository, a data lake—which is called schema-on-write and it’s what we’re currently doing in our relational systems—the data freely flows from lots of different sources into the lake in its raw, native form. 

The data lake inquirer can now apply her own lens to the data as she sees fit—as she’s “reading” and integrating the data from this complex, ever-evolving data lake. Why is this important? First, this allows the inquirer to be extremely agile and go with the flow, if you will. And second, she can start getting value from her data “now”—instead of waiting for it to go through the more traditional schema-on-write process. 


Anne Buff4. In your argument regarding more questions and better answers, you state “A business user can now ask the data lake any question based on the known data in the lake.” Given that the technical skills to access and analyze data from a Hadoop or other big data environment are significantly specialized and not abundant, how do you suggest a business user ask the data lake any question based on the known data in the lake?


Tamara DullYes, I will maintain that any business user should be able to ask the data lake any question—but I don’t believe that any business user should have direct access to the data lake. As I was discussing earlier, a data lake gives its inquirers a lot of flexibility and agility; however, you only want to give that sort of freedom to those that are trained, equipped and empowered to deal with complex, evolving data repositories – yes, those with the technical chops like data scientists and engineers.

As for the rest of the business users: Give ‘em an app! Just kidding—sort of. Since the data lake opens the door to “more questions and better answers,” provide better solutions for business users—whether it be employees, customers or partners—to ask these questions, and maybe even explore some of the answers themselves (where it’s safe to swim). Some of your best questions (and answers) may be resting with this crowd.


Previously in the Data Lake Debate:

  • The Introduction – by Jill Dyche
  • Pro’s Up First – by Tamara Dull
TAGGED:Data Lake Debate
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

image fx (2)
Monitoring Data Without Turning into Big Brother
Big Data Exclusive
image fx (71)
The Power of AI for Personalization in Email
Artificial Intelligence Exclusive Marketing
image fx (67)
Improving LinkedIn Ad Strategies with Data Analytics
Analytics Big Data Exclusive Software
big data and remote work
Data Helps Speech-Language Pathologists Deliver Better Results
Analytics Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Image
Best PracticesBig DataHadoopUnstructured Data

The Data Lake Debate: Pro Delivers Final Rebuttal and Summary

5 Min Read
Image
Best PracticesBig DataData ManagementData WarehousingHadoop

The Data Lake Debate: The Introduction

3 Min Read
Image
Big DataData ManagementHadoopPolicy and Governance

The Data Lake Debate: Conclusion (With Apologies to the Rolling Stones)

4 Min Read
Image
Data ManagementHadoopKnowledge ManagementOpen SourceUnstructured Data

The Data Lake Debate: Pro Cross-Examines Con

7 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data
AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?