By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics in sports industry
    Here’s How Data Analytics In Sports Is Changing The Game
    6 Min Read
    data analytics on nursing career
    Advances in Data Analytics Are Rapidly Transforming Nursing
    8 Min Read
    data analytics reveals the benefits of MBA
    Data Analytics Technology Proves Benefits of an MBA
    9 Min Read
    data-driven image seo
    Data Analytics Helps Marketers Substantially Boost Image SEO
    8 Min Read
    construction analytics
    5 Benefits of Analytics to Manage Commercial Construction
    5 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: Data Warehousing and Data Science
Share
Notification Show More
Latest News
data analytics in sports industry
Here’s How Data Analytics In Sports Is Changing The Game
Big Data
data analytics on nursing career
Advances in Data Analytics Are Rapidly Transforming Nursing
Analytics
data analytics reveals the benefits of MBA
Data Analytics Technology Proves Benefits of an MBA
Analytics
anti-spoofing tips
Anti-Spoofing is Crucial for Data-Driven Businesses
Security
ai in software development
3 AI-Based Strategies to Develop Software in Uncertain Times
Software
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > Data Warehousing and Data Science
Data MiningData Warehousing

Data Warehousing and Data Science

Barry Devlin
Last updated: 2011/01/12 at 3:32 PM
Barry Devlin
6 Min Read
SHARE

David Champagne has recently written a fascinating article for TDWI entitled “The Rise of Data Science“ where he reminds us of the scientific method–question, hypothesize, experiment, analyze data, draw conclusions regarding your hypothesis and communicate your results; and an important loop back to rethink the hypothesis if the results don’t fully validate it.  I remember it well from my Ph.D. days way back in the late ’70s (in physical chemistry, in case you ask).

David Champagne has recently written a fascinating article for TDWI entitled “The Rise of Data Science“ where he reminds us of the scientific method–question, hypothesize, experiment, analyze data, draw conclusions regarding your hypothesis and communicate your results; and an important loop back to rethink the hypothesis if the results don’t fully validate it.  I remember it well from my Ph.D. days way back in the late ’70s (in physical chemistry, in case you ask).

Champagne goes on to observe the situation today: “…thanks largely to all of the newer tools and techniques available for handling ever-larger sets of data, we often start with the data, build models around the data, run the models, and see what happens. This is less like science and more like panning for gold.”  Well said!  But, I’d go a little further.  It can sometimes be more like diving on a sunken Spanish galleon but discovering a dozen giant Moray eels rather than twelve gold doubloons!

A key point, in my view, is that science and business have rather different goals and visions.  Science, in theory, at least, seeks to discover real and eternal truths.  Of course, pride and politics can intrude and cause data to be selectively gathered, suppressed or misinterpreted.  The aim in business is basically to improve the bottom line.  Nothing wrong with that, of course, but organizational and personal aims and concerns often strongly drive the perceived best path to that goal.

More Read

Sneak Peak of Largest Ever MR Survey

ETL, Data Quality and MDM for Mid-sized Business

Another and, more important, difference is in the data.  Scientific experiments are designed to gather particular data elements of relevance to the hypothesis.  Business data, especially big data, is a mishmash of data gathered for a variety of reasons, without a common purpose or design in mind.  The result is that it is often incomplete and inconsistent, and thus open to wildly varying analyses and interpretations.  Soft sciences like psychology and sociology may face a similar set of problems as experimental data is usually much more intermingled and inconsistent than that from physics experiments, leading to more widely diverging interpretations.

Now, please hear me clearly, there’s a lot of great and innovative analysis going on in this field–see Mike Loukides excellent summary, “What is data science?“, from six months ago for some examples of this.  But, it is very much like diving on Spanish wrecks; given the right people with enthusiasm, relevant skills and subject matter expertise you can find treasure.  But with the wrong people, you can suffer some terrible injuries.  The question is: how do you move from experimental science to production?  How do you safely scale from the test tube to the 1,000 litre reactor vessel?

Note that this is not a question of scaling data size, processing power or storage.  It is all about scaling the people and process aspects of innovative analysis into regular production.  This is where a data warehouse comes in.  Of course, only a small proportion of the data can (or should) go through the warehouse.  But the value of the warehouse is in the fact that the data it contains has already been reconciled and integrated to an accepted level of consistency and historical accuracy for the organization.  This requires a subtle rethinking of the role of the data warehouse: it is no longer seen as the sole source of all reporting or the single version of the truth.  Rather, it becomes the central set of core business information that ties together disparate analyses and insights from across a much larger information resource.  It can help discover gold rather than Moray eels.

This scaling and move to production remains a difficult and often expensive problem to solve.  In this, I have to disagree with Michael Driscoll, quoted in Champagne’s article, who says: “Data management is, increasingly, a solved problem”.  I wish it were so…  But the tools and techniques, skills and expertise that organizations have built around their data warehouses and the investments they’ve made in the technology is key to addressing the deep data management issues that need to be addressed.  It may not be as sexy as statistics has seemingly become, but, in my view, being able to solve the data management problems will be a better indicator of long-term success in this field.

I’ll be covering this at O’Reilly Media’s first Strata Conference, “Making Data Work”,1-3 February in Santa Clara, California. A keynote, “The Heat Death of the Data Warehouse“, Thursday, 3 February, 9:25am and an Exec Summit session, “The Data-driven Business and Other Lessons from History“, Tuesday, 1 February, 9:45am.  O’Reilly Media are offering a 25% discount code for readers, followers, and friends on conference registration:  str11fsd.

TAGGED: data integration tools, sampling techniques
Barry Devlin January 12, 2011
Share this Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

data analytics in sports industry
Here’s How Data Analytics In Sports Is Changing The Game
Big Data
data analytics on nursing career
Advances in Data Analytics Are Rapidly Transforming Nursing
Analytics
data analytics reveals the benefits of MBA
Data Analytics Technology Proves Benefits of an MBA
Analytics
anti-spoofing tips
Anti-Spoofing is Crucial for Data-Driven Businesses
Security

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

Sneak Peak of Largest Ever MR Survey

4 Min Read

ETL, Data Quality and MDM for Mid-sized Business

5 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive
AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?