By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics in sports industry
    Here’s How Data Analytics In Sports Is Changing The Game
    6 Min Read
    data analytics on nursing career
    Advances in Data Analytics Are Rapidly Transforming Nursing
    8 Min Read
    data analytics reveals the benefits of MBA
    Data Analytics Technology Proves Benefits of an MBA
    9 Min Read
    data-driven image seo
    Data Analytics Helps Marketers Substantially Boost Image SEO
    8 Min Read
    construction analytics
    5 Benefits of Analytics to Manage Commercial Construction
    5 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: Jeff Hammerbacher on Experiences Evolving a New Analytical Platform
Share
Notification Show More
Latest News
data analytics in sports industry
Here’s How Data Analytics In Sports Is Changing The Game
Big Data
data analytics on nursing career
Advances in Data Analytics Are Rapidly Transforming Nursing
Analytics
data analytics reveals the benefits of MBA
Data Analytics Technology Proves Benefits of an MBA
Analytics
anti-spoofing tips
Anti-Spoofing is Crucial for Data-Driven Businesses
Security
ai in software development
3 AI-Based Strategies to Develop Software in Uncertain Times
Software
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > Jeff Hammerbacher on Experiences Evolving a New Analytical Platform
Data Mining

Jeff Hammerbacher on Experiences Evolving a New Analytical Platform

Daniel Tunkelang
Last updated: 2011/11/19 at 4:51 PM
Daniel Tunkelang
0 Min Read
SHARE

This post is part of a series summarizing the presentations at the CIKM 2011 Industry Event, which I chaired with former Endeca colleague Tony Russell-Rose.

The third speaker in the program was Cloudera co-founder and Chief Scientist Jeff Hammerbacher. Jeff, recently hailed by Tim O’Reilly as one of the world’s most powerful data scientists, built the Facebook Data Team, which is most known for open-source contributions that include Hive and Cassandra. Jeff’s talk was entitled “Experiences Evolving a New Analytical Platform: What Works and What’s Missing“. I am thankful to Jeff Dalton for live-blogging a summary.

More Read

data mining helps with offsite SEO

Can Data Mining Aid with Off-Page SEO Strategies?

3 Data Mining Tips for Companies Trying to Understand their Customers
5 Data Mining Tips to Leverage the Benefits of Surveys
Perform Data Mining With Web Scrapers to Track Prices
Data Mining Vital Statistics Yields Fascinating Societal Insights

Jeff’s talk was a whirlwind tour through the philosophy and technology for delivering large-scale analytics (aka “big data”) to the world:

1) Philosophy

The true challenges in the task of data mining are creating a data set with the relevant and accurate information and determining the appropriate analysis techniques. While in the past it made sense to plan data storage and structure around the intended use of the data, the economics of storage and the availability of open-source analytics platforms argue for the reverse: data first, ask questions later; store first, establish structure later. The goal is to enable everyone — developers, analysts, business users — to “party on the data”, providing infrastructure that keeps them from clobbering one another or starving each other of resources.

2) Defining the Platform

No one just uses a relational database anymore. For example, consider Microsoft SQL Server. It is actually part of a unified suite that includes SharePoint for collaboration, PowerPivot for OLAP, StreamInsight for complex event processing (CEP), etc. As with the LAMP stack, there is a coherent framework analytical data management which we can call an analytical data platform.

3) Cloudera’s Platform

Cloudera starts with a substrate architecture of Open Compute commodity Linux servers configured using Puppet and Chef and coordinated using ZooKeeper. Naturally this entire stack is open-source. They use HFDS and Ceph to provide distributed, schema-less storage. They offer append-only table storage and metadata using Avro, RCFile, and HCatalog; and mutable table storage and metadata using HBase. For computation, they offer YARN (inter-job scheduling, like Grid Engine, for data intensive computing) and Mesos for cluster resource management; MapReduce, Hamster (MPI), Spark, Dryad / DryadLINQ, Pregel (Giraph), and Dremel as processing frameworks; and Crunch  (like Google’s FlumeJava), PigLatin, HiveQL, and Oozie as high-level interfaces. Finally, Cloudera offers tool access through FUSE, JDBC, and ODBC; and data ingest through Sqoop and Flume.

4) What’s Next?

For the substrate, we can expect support for fat servers with fat pipes, operating system support for isolation, and improved local filesystems (e.g., btrfs). Storage improvements will give us a unified file format, compression, better performance and availability, richer metadata, distributed snapshots, replication across data centers, native client access, and separation of namespace and block management. We will see stabilization of our existing compute tools and better variety, as well as improved fault tolerance, isolation and workload management, low-latency job scheduling, and a unified execution backend for workflow. And we will see better integration through REST API access to all platform components, better document ingest, maintenance of source catalog and provenance information, and an integration beyond ODBC with analytics tools. We will also see tools that facilitate that transition from unstructured to structured data (e.g. RecordBreaker).

Jeff’s talk was as information-dense as this post suggests, and I hope the mostly-academic CIKM audience was not too shell-shocked. It’s fantastic to see practitioners not only building essential tools for research in information and knowledge management, but reaching out to the research community to build bridges. I saw lots of intense conversation after his talk, and I hope the results realize the two-fold mission of the Industry Event, which is to give  researchers an opportunity to learn about the problems most relevant to industry practitioners, and to offer practitioners an opportunity to deepen their understanding of the field in which they are working.

Daniel Tunkelang November 19, 2011
Share this Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

data analytics in sports industry
Here’s How Data Analytics In Sports Is Changing The Game
Big Data
data analytics on nursing career
Advances in Data Analytics Are Rapidly Transforming Nursing
Analytics
data analytics reveals the benefits of MBA
Data Analytics Technology Proves Benefits of an MBA
Analytics
anti-spoofing tips
Anti-Spoofing is Crucial for Data-Driven Businesses
Security

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

data mining helps with offsite SEO
Data Mining

Can Data Mining Aid with Off-Page SEO Strategies?

10 Min Read
using data mining to learn more about customers
Big Data

3 Data Mining Tips for Companies Trying to Understand their Customers

6 Min Read
surveys data
Data Mining

5 Data Mining Tips to Leverage the Benefits of Surveys

11 Min Read
data mining is game changer for small businesses
Data Mining

Perform Data Mining With Web Scrapers to Track Prices

7 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots
AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?