Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics for pharmacy trends
    How Data Analytics Is Tracking Trends in the Pharmacy Industry
    5 Min Read
    car expense data analytics
    Data Analytics for Smarter Vehicle Expense Management
    10 Min Read
    image fx (60)
    Data Analytics Driving the Modern E-commerce Warehouse
    13 Min Read
    big data analytics in transporation
    Turning Data Into Decisions: How Analytics Improves Transportation Strategy
    3 Min Read
    sales and data analytics
    How Data Analytics Improves Lead Management and Sales Results
    9 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Schema on Read vs Schema on Write and Why Shakespeare Hates Me
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Uncategorized > Schema on Read vs Schema on Write and Why Shakespeare Hates Me
Uncategorized

Schema on Read vs Schema on Write and Why Shakespeare Hates Me

Paige Roberts
Paige Roberts
5 Min Read
SHARE

A couple of months ago, I found myself without a full time gig for the first time in decades, and I did a little freelance blogging. Being an overachiever, I wrote such a long post for Adaptive Systems Inc. that I broke it into two parts. The first part got published before I dove head first into documenting and unit testing a big Hadoop implementation. The second part got published last week.

A couple of months ago, I found myself without a full time gig for the first time in decades, and I did a little freelance blogging. Being an overachiever, I wrote such a long post for Adaptive Systems Inc. that I broke it into two parts. The first part got published before I dove head first into documenting and unit testing a big Hadoop implementation. The second part got published last week.

It was interesting reading my opinions on the nature and comparative strengths of the various strategies and technologies from a few months ago. It had been long enough that I didn’t remember what I’d written. I got a kick out of comparing my perspective, now that I have some recent hands-on experience digging through Hive code, comparing query speed with ORC vs without, or with MapReduce vs Tez.

In part 1, I made Shakespeare roll in his grave by misquoting Hamlet repeatedly, while talking about the merits of schema on read versus schema on write strategies in Hadoop data lake projects. This time, I butchered Romeo and Juliet when looking at SQL in Hadoop technologies that use these two strategies, and recommended how to decide which ones to use for your next Hadoop data lake project. Main point: it depends on which balcony you want to climb. And either way, SQL is not off the menu just because you’re using Hadoop.

More Read

Top Big Data Challenges Revisited
Blinded
Want 60 Million Customers? Here’s How Netflix Does It
Open Calais at the New York Semantic Web Meetup
Three-Minutes are for Eggs

Hadoop Data Lake Balcony

Yup, still feel pretty much the same on the subject. Nothing new there. Glad to see the post get released into the wild at last.

Life has taken a radical shift, though. This is the first time in ages that I’ve had a job where social media activity wasn’t part of the job description. My blog has been neglected. My Twitter followers probably think I fell off the face of the earth. But no. Still here. Just heads down a lot of the time, digging through code and quizzing folks about why this strategy was picked, or how this problem was solved.

I go through phases in my life where I learn and practice, and other phases where I assimilate and share. Of course, all of my life has been a mishmash of both, but the pendulum swings more toward one end of the spectrum or the other sometimes. Right now, I’m swimming in information and splashing around like a kid in summer.

The Content Pool by Alan J Porter

My friend Alan J. Porter wrote a great book about content management called The Content Pool, and another friend, Doug Potter, did this awesome cover illustration. I feel like that guy on the cover. (Buy that book, btw, if you do anything content management related. It’s a must have.)

I still owe everybody a Storm post, and it is coming, but I’ve also been learning a lot about Apache Nifi, ironically recently re-named DataFlow. Expect a post to come on that. I wrote something up a couple weeks ago about reasons Hadoop implementations fail. That’s bound to show up somewhere soon.

Stay tuned … Same bat time, same bat channel.

On another note, I’m interested in comparing ETL workflow orchestration tools, especially open source ones, but also good commercial ones if they’re not priced out of the usual Hadoop market. I’m looking for things that are Oozie-like, but better than Oozie. (Oozie is NOT one of my favorite Hadoop ecosystem bits.)

Suggestions?

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

dedicated servers for ai businesses
5 Reasons AI-Driven Business Need Dedicated Servers
Artificial Intelligence Exclusive News
data analytics for pharmacy trends
How Data Analytics Is Tracking Trends in the Pharmacy Industry
Analytics Big Data Exclusive
ai call centers
Using Generative AI Call Center Solutions to Improve Agent Productivity
Artificial Intelligence Exclusive
warehousing in the age of big data
Top Challenges Of Product Warehousing In The Age Of Big Data
Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Quality vs. Quantity

3 Min Read

Mok Oh: To Do Data Science, You Need a Team of Specialists

13 Min Read

Big Data, All Data, PureData, BLU Data

7 Min Read

Computing was all 1s and 0s — now it’s all SOA

1 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive
AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?