Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
    pexels pavel danilyuk 8112119
    Data Analytics Is Revolutionizing Medical Credentialing
    8 Min Read
    data and seo
    Maximize SEO Success with Powerful Data Analytics Insights
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: The Big Data Debate: Correlation vs. Causation
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Data Management > Best Practices > The Big Data Debate: Correlation vs. Causation
AnalyticsBest PracticesBig DataBusiness IntelligenceCulture/LeadershipData Management

The Big Data Debate: Correlation vs. Causation

gilpress
gilpress
11 Min Read
big data debate
SHARE

In the first quarter of 2013, the stock of big data has experienced sudden declines followed by sporadic bouts of enthusiasm. The volatility—a new big data “V”—continues and Ted Cuzzillo summed up the recent negative sentiment in “Big data, big hype, big danger” on SmartDataCollective:

In the first quarter of 2013, the stock of big data has experienced sudden declines followed by sporadic bouts of enthusiasm. The volatility—a new big data “V”—continues and Ted Cuzzillo summed up the recent negative sentiment in “Big data, big hype, big danger” on SmartDataCollective:

big data debate“A remarkable thing happened in Big Data last week. One of Big Data’s best friends poked fun at one of its cornerstones: the Three V’s. The well-networked and alert observer Shawn Rogers, vice president of research at Enterprise Management Associates, tweeted his eight V’s: ‘…Vast, Volumes of Vigorously, Verified, Vexingly Variable Verbose yet Valuable Visualized high Velocity Data.’ He was quick to explain to me that this is no comment on Gartner analyst Doug Laney’s three-V definition. Shawn’s just tired of people getting stuck on V’s.”

Indeed, all the people who “got stuck” on Laney’s “definition,” conveniently forgot that he first used the “three-Vs” to describe data management challenges in 2001. Yes, 2001. If big data is a “revolution,” how come its widely-used “definition” is based on a dozen year-old analyst note?

More Read

Future of Fintech
How Are Predictive Analytics Shaping the Future of Fintech?
Why the Future of Social Media Will Depend on Artificial Intelligence
How To Use Big Data As Part Of Your Investment Planning
Adding more intelligence to business process
Branding Your Country

Ranting about how “blogs and articles yammer on with the benefits of ‘big data,’” Cuzzillo correctly observes that they are simply “repeating promises made years ago about the benefits of small data and small analytics. This is old decision support super-sized and warmed over, the ‘new and improved’ that won’t satisfy any better than the original but which costs much, much more.”

Cuzzillo is joined by a growing chorus of critics that challenge some of the breathless pronouncements of big data enthusiasts. Specifically, it looks like the backlash theme-of-the-month is correlation vs. causation, possibly in reaction to the success of Viktor Mayer-Schönberger and Kenneth Cukier’s recent big data book in which they argued for dispensing “with a reliance on causation in favor of correlation” (see my discussion of the book and this argument).

In “Steamrolled by Big Data,” The New Yorker’s Gary Marcus declares that “Big Data isn’t nearly the boundless miracle that many people seem to think it is.” He concedes that “Big Data can be especially helpful in systems that are consistent over time, with straightforward and well-characterized properties, little unpredictable variation, and relatively little underlying complexity.” But Marcus warns that “not every problem fits those criteria; unpredictability, complexity, and abrupt shifts over time can lead even the largest data astray. Big Data is a powerful tool for inferring correlations, not a magic wand for inferring causality.” Calling for “a sensitivity to when humans should and should not remain in the loop,” Marcus quotes Alexei Efros, “one of the leaders in applying Big Data to machine vision,” who described big data as “a fickle, coy mistress.”

Matti Keltanen at The Guardian agrees, explaining “Why ‘lean data’ beats big data.” Writes Keltanen: “…the lightest, simplest way to achieve your data analysis goals is the best one…The dirty secret of big data is that no algorithm can tell you what’s significant, or what it means. Data then becomes another problem for you to solve. A lean data approach suggests starting with questions relevant to your business and finding ways to answer them through data, rather than sifting through countless data sets. Furthermore, purely algorithmic extraction of rules from data is prone to creating spurious connections, such as false correlations… today’s big data hype seems more concerned with indiscriminate hoarding than helping businesses make the right decisions.”

In “Data Skepticism,” O’Reilly Radar’s Mike Loukides adds this gem to the discussion: “The idea that there are limitations to data, even very big data, doesn’t contradict Google’s mantra that more data is better than smarter algorithms; it does mean that even when you have unlimited data, you have to be very careful about the conclusions you draw from that data. It is in conflict with the all-too-common idea that, if you have lots and lots of data, correlation is as good as causation.”

Isn’t more-data-is-better the same as correlation-is-as-good-as-causation? Or, in the words of Chris Andersen, “with enough data, the numbers speak for themselves.”

That’s much more than a mantra. It’s the big data religion, its core mystical experience: The data speak (how prescient was Larry Ellison when he re-named his company in 1982?).

“Can numbers actually speak for themselves?” non-believer Kate Crawford asks in “The Hidden Biases in Big Data” on the Harvard Business Review blog and answers: “Sadly, they can’t. Data and data sets are not objective; they are creations of human design. We give numbers their voice, draw inferences from them, and define their meaning through our interpretations. Hidden biases in both the collection and analysis stages present considerable risks, and are as important to the big-data equation as the numbers themselves. We get a much richer sense of the world when we ask people the why and the how not just the ‘how many.'”

A NPR blogger notes that “while Big Data can uncover correlations between data, it doesn’t reveal causation. Sometimes, that doesn’t really matter, but other times, it might — in ways we’re not always aware of.” He (or she) also quotes The New York Times Steve Lohr who quotes Albert Einstein: “Not everything that counts can be counted, and not everything that can be counted counts.”

Speaking of Einstein (“imagination is more important than knowledge”), E. O. Wilson in The Wall Street Journal takes the discussion to a whole new level. While he doesn’t specifically mention big data, Wilson (in “great scientists don’t need math”) makes an important distinction between using only mathematics and using one’s imagination or intuition: “I have a professional secret to share: Many of the most successful scientists in the world today are mathematically no more than semiliterate… Fortunately, exceptional mathematical fluency is required in only a few disciplines, such as particle physics, astrophysics and information theory. Far more important throughout the rest of science is the ability to form concepts, during which the researcher conjures images and processes by intuition… The annals of theoretical biology are clogged with mathematical models that either can be safely ignored or, when tested, fail. Possibly no more than 10% have any lasting value. Only those linked solidly to knowledge of real living systems have much chance of being used.”

And David Brooks in The New York Times, while probing the limits of “the big data revolution,” takes the discussion to yet another level: “One limit is that correlations are actually not all that clear. A zillion things can correlate with each other, depending on how you structure the data and what you compare. To discern meaningful correlations from meaningless ones, you often have to rely on some causal hypothesis about what is leading to what. You wind up back in the land of human theorizing… Most of the advocates understand data is a tool, not a worldview. My worries mostly concentrate on the cultural impact of the big data vogue. If you adopt a mind-set that replaces the narrative with the empirical, you have problems thinking about personal responsibility and morality, which are based on causation. You wind up with a demoralized society.”

I don’t think that the big data mind-set replaces “the narrative” with the empirical. It replaces it with numbers and correlations. There is nothing wrong with a scientific mind-set, based on empirical observations, as long as people don’t mistake number-crunching for scientific inquiry or see cause-and-effect in correlations.

Kaiser Fung concludes in his summary of the recent Reinhart-Rogof kerfuffle (“Occupational hazards in data science”) that the problem of seeing (or implying) causation in correlations is found not just in economics but also in medical research and other fields using observational data: “The usual ploy is first acknowledge that the data could not prove causality (‘we found an association between sleeping less and snoring; our data does not allow us to prove causation.’), then quietly assume that the causal link is there, and wax on the implications (‘if you want to snore less, sleep less.’)” Or, as The Atlantic’s Matthew O’Brien puts it: “R-R whisper ‘correlation’ to other economists, but say ‘causation’ to everyone else.”

Whether you use small or big data, your imagination (developing theories) and integrity (following the scientific method) are what counts. Correlations can count, too, in certain situations. Just don’t expect them to explain anything.

[Originally published on Forbes.com]

(image: big data debate / shutterstock)

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

image fx (2)
Monitoring Data Without Turning into Big Brother
Big Data Exclusive
image fx (71)
The Power of AI for Personalization in Email
Artificial Intelligence Exclusive Marketing
image fx (67)
Improving LinkedIn Ad Strategies with Data Analytics
Analytics Big Data Exclusive Software
big data and remote work
Data Helps Speech-Language Pathologists Deliver Better Results
Analytics Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

data privacy and other data laws to recognize
Data Collection

The Legal Requirements For Gathering Data

5 Min Read
data-driven SEO
Big Data

Why Data-Driven SEO is Crucial for SMEs in This Recession

13 Min Read

The Big Deal is in the 2013 Business Analytics Research Agenda

9 Min Read
AI technology in fintech
Fintech

AI Technology Helps Consumers Improve Access to Financial Capital During Current Recession

6 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots
AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?