Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    How Data Analytics Is Reshaping Patient Financing Decisions
    How Data Analytics Is Reshaping Patient Financing Decisions
    13 Min Read
    business using business intelligence
    How to Use a Competitive Intelligence Dashboard to Turn Market Data Into Smarter Marketing Decisions 
    9 Min Read
    unusual trading activity
    Signal Or Noise? A Decision Tree For Evaluating Unusual Trading Activity
    3 Min Read
    software developer using ai
    How Data Analytics Helps Developers Deliver Better Tech Services
    8 Min Read
    ai for stock trading
    Can Data Analytics Help Investors Outperform Warren Buffett
    9 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Schema on Read vs Schema on Write and Why Shakespeare Hates Me
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Uncategorized > Schema on Read vs Schema on Write and Why Shakespeare Hates Me
Uncategorized

Schema on Read vs Schema on Write and Why Shakespeare Hates Me

Paige Roberts
Paige Roberts
5 Min Read
SHARE

A couple of months ago, I found myself without a full time gig for the first time in decades, and I did a little freelance blogging. Being an overachiever, I wrote such a long post for Adaptive Systems Inc. that I broke it into two parts. The first part got published before I dove head first into documenting and unit testing a big Hadoop implementation. The second part got published last week.

A couple of months ago, I found myself without a full time gig for the first time in decades, and I did a little freelance blogging. Being an overachiever, I wrote such a long post for Adaptive Systems Inc. that I broke it into two parts. The first part got published before I dove head first into documenting and unit testing a big Hadoop implementation. The second part got published last week.

It was interesting reading my opinions on the nature and comparative strengths of the various strategies and technologies from a few months ago. It had been long enough that I didn’t remember what I’d written. I got a kick out of comparing my perspective, now that I have some recent hands-on experience digging through Hive code, comparing query speed with ORC vs without, or with MapReduce vs Tez.

In part 1, I made Shakespeare roll in his grave by misquoting Hamlet repeatedly, while talking about the merits of schema on read versus schema on write strategies in Hadoop data lake projects. This time, I butchered Romeo and Juliet when looking at SQL in Hadoop technologies that use these two strategies, and recommended how to decide which ones to use for your next Hadoop data lake project. Main point: it depends on which balcony you want to climb. And either way, SQL is not off the menu just because you’re using Hadoop.

More Read

First Look: SAS Factory Miner
Teradata Partners: Going “Big”
Interview with David Smith, REvolution Computing
The Non-Fault in iCloud: Why You Shouldn’t Shut Down Your Account
Interactive Intelligence Innovates in the Cloud for Contact Centers

Hadoop Data Lake Balcony

Yup, still feel pretty much the same on the subject. Nothing new there. Glad to see the post get released into the wild at last.

Life has taken a radical shift, though. This is the first time in ages that I’ve had a job where social media activity wasn’t part of the job description. My blog has been neglected. My Twitter followers probably think I fell off the face of the earth. But no. Still here. Just heads down a lot of the time, digging through code and quizzing folks about why this strategy was picked, or how this problem was solved.

I go through phases in my life where I learn and practice, and other phases where I assimilate and share. Of course, all of my life has been a mishmash of both, but the pendulum swings more toward one end of the spectrum or the other sometimes. Right now, I’m swimming in information and splashing around like a kid in summer.

The Content Pool by Alan J Porter

My friend Alan J. Porter wrote a great book about content management called The Content Pool, and another friend, Doug Potter, did this awesome cover illustration. I feel like that guy on the cover. (Buy that book, btw, if you do anything content management related. It’s a must have.)

I still owe everybody a Storm post, and it is coming, but I’ve also been learning a lot about Apache Nifi, ironically recently re-named DataFlow. Expect a post to come on that. I wrote something up a couple weeks ago about reasons Hadoop implementations fail. That’s bound to show up somewhere soon.

Stay tuned … Same bat time, same bat channel.

On another note, I’m interested in comparing ETL workflow orchestration tools, especially open source ones, but also good commercial ones if they’re not priced out of the usual Hadoop market. I’m looking for things that are Oozie-like, but better than Oozie. (Oozie is NOT one of my favorite Hadoop ecosystem bits.)

Suggestions?

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

How Data Analytics Is Reshaping Patient Financing Decisions
How Data Analytics Is Reshaping Patient Financing Decisions
Analytics Big Data Exclusive
AI driven big data company
How AI-Driven Workflows Are Changing the Way Companies Think About Data Risk
Artificial Intelligence Data Management Exclusive Risk Management
ai product development
Why Businesses Outsource AI Product Development Companies
Exclusive News
banking tools
The Fintech and Banking Tools Global Entrepreneurs Rely On
Fintech Infographic

Stay Connected

1.2KFollowersLike
33.7KFollowersFollow
222FollowersPin

You Might also Like

What Data Will Your Customers Share for Better Service?

4 Min Read

Marketing Lessons Learned From Micro-Finance In India

7 Min Read

The Wisdom of the Social Media Crowd

9 Min Read

Control of attention is the ultimate individual power

3 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data
AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?