By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    data science anayst
    Growing Demand for Data Science & Data Analyst Roles
    6 Min Read
    predictive analytics in dropshipping
    Predictive Analytics Helps New Dropshipping Businesses Thrive
    12 Min Read
    data-driven approach in healthcare
    The Importance of Data-Driven Approaches to Improving Healthcare in Rural Areas
    6 Min Read
    analytics for tax compliance
    Analytics Changes the Calculus of Business Tax Compliance
    8 Min Read
    big data analytics in gaming
    The Role of Big Data Analytics in Gaming
    10 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: Schema on Read vs Schema on Write and Why Shakespeare Hates Me
Share
Notification Show More
Latest News
SMEs Use AI-Driven Financial Software for Greater Efficiency
Artificial Intelligence
data security in big data age
6 Reasons to Boost Data Security Plan in the Age of Big Data
Big Data
data science anayst
Growing Demand for Data Science & Data Analyst Roles
Data Science
ai software development
Key Strategies to Develop AI Software Cost-Effectively
Artificial Intelligence
ai in omnichannel marketing
AI is Driving Huge Changes in Omnichannel Marketing
Artificial Intelligence
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Uncategorized > Schema on Read vs Schema on Write and Why Shakespeare Hates Me
Uncategorized

Schema on Read vs Schema on Write and Why Shakespeare Hates Me

Paige Roberts
Last updated: 2015/09/22 at 6:30 AM
Paige Roberts
5 Min Read
SHARE

A couple of months ago, I found myself without a full time gig for the first time in decades, and I did a little freelance blogging. Being an overachiever, I wrote such a long post for Adaptive Systems Inc. that I broke it into two parts. The first part got published before I dove head first into documenting and unit testing a big Hadoop implementation. The second part got published last week.

A couple of months ago, I found myself without a full time gig for the first time in decades, and I did a little freelance blogging. Being an overachiever, I wrote such a long post for Adaptive Systems Inc. that I broke it into two parts. The first part got published before I dove head first into documenting and unit testing a big Hadoop implementation. The second part got published last week.

It was interesting reading my opinions on the nature and comparative strengths of the various strategies and technologies from a few months ago. It had been long enough that I didn’t remember what I’d written. I got a kick out of comparing my perspective, now that I have some recent hands-on experience digging through Hive code, comparing query speed with ORC vs without, or with MapReduce vs Tez.

In part 1, I made Shakespeare roll in his grave by misquoting Hamlet repeatedly, while talking about the merits of schema on read versus schema on write strategies in Hadoop data lake projects. This time, I butchered Romeo and Juliet when looking at SQL in Hadoop technologies that use these two strategies, and recommended how to decide which ones to use for your next Hadoop data lake project. Main point: it depends on which balcony you want to climb. And either way, SQL is not off the menu just because you’re using Hadoop.

More Read

big data improves

3 Ways Big Data Improves Leadership Within Companies

IT Is Not Analytics. Here’s Why.
Romney Invokes Analytics in Rebuke of Trump
WEF Davos 2016: Top 100 CEO bloggers
In Memoriam: Robin Fray Carey

Hadoop Data Lake Balcony

Yup, still feel pretty much the same on the subject. Nothing new there. Glad to see the post get released into the wild at last.

Life has taken a radical shift, though. This is the first time in ages that I’ve had a job where social media activity wasn’t part of the job description. My blog has been neglected. My Twitter followers probably think I fell off the face of the earth. But no. Still here. Just heads down a lot of the time, digging through code and quizzing folks about why this strategy was picked, or how this problem was solved.

I go through phases in my life where I learn and practice, and other phases where I assimilate and share. Of course, all of my life has been a mishmash of both, but the pendulum swings more toward one end of the spectrum or the other sometimes. Right now, I’m swimming in information and splashing around like a kid in summer.

The Content Pool by Alan J Porter

My friend Alan J. Porter wrote a great book about content management called The Content Pool, and another friend, Doug Potter, did this awesome cover illustration. I feel like that guy on the cover. (Buy that book, btw, if you do anything content management related. It’s a must have.)

I still owe everybody a Storm post, and it is coming, but I’ve also been learning a lot about Apache Nifi, ironically recently re-named DataFlow. Expect a post to come on that. I wrote something up a couple weeks ago about reasons Hadoop implementations fail. That’s bound to show up somewhere soon.

Stay tuned … Same bat time, same bat channel.

On another note, I’m interested in comparing ETL workflow orchestration tools, especially open source ones, but also good commercial ones if they’re not priced out of the usual Hadoop market. I’m looking for things that are Oozie-like, but better than Oozie. (Oozie is NOT one of my favorite Hadoop ecosystem bits.)

Suggestions?

Paige Roberts September 22, 2015
Share this Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

SMEs Use AI-Driven Financial Software for Greater Efficiency
Artificial Intelligence
data security in big data age
6 Reasons to Boost Data Security Plan in the Age of Big Data
Big Data
data science anayst
Growing Demand for Data Science & Data Analyst Roles
Data Science
ai software development
Key Strategies to Develop AI Software Cost-Effectively
Artificial Intelligence

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

big data improves
Big DataJobsKnowledge ManagementUncategorized

3 Ways Big Data Improves Leadership Within Companies

6 Min Read
Image
Uncategorized

IT Is Not Analytics. Here’s Why.

7 Min Read

Romney Invokes Analytics in Rebuke of Trump

4 Min Read

WEF Davos 2016: Top 100 CEO bloggers

14 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data
AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?