Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    composable analytics
    How Composable Analytics Unlocks Modular Agility for Data Teams
    9 Min Read
    data mining to find the right poly bag makers
    Using Data Analytics to Choose the Best Poly Mailer Bags
    12 Min Read
    data analytics for pharmacy trends
    How Data Analytics Is Tracking Trends in the Pharmacy Industry
    5 Min Read
    car expense data analytics
    Data Analytics for Smarter Vehicle Expense Management
    10 Min Read
    image fx (60)
    Data Analytics Driving the Modern E-commerce Warehouse
    13 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Statistical Rules of Thumb, Part III: Always Visualize the Data
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Visualization > Statistical Rules of Thumb, Part III: Always Visualize the Data
Data VisualizationStatistics

Statistical Rules of Thumb, Part III: Always Visualize the Data

DeanAbbott
DeanAbbott
2 Min Read
SHARE

As I perused Statistical Rules of Thumb again, as I do from time to time, I came across this gem. (note: I live in CA, so get no money from these Amazon links).

As I perused Statistical Rules of Thumb again, as I do from time to time, I came across this gem. (note: I live in CA, so get no money from these Amazon links).

Van Belle uses the term “Graph” rather than “Visualize”, but it is the same idea. The point is to visualize in addition to computing summary statistics. Summaries are useful, but can be deceiving; any time you summarize data you will lose some information unless the distributions are well behaved. The scatterplot, histogram, box and whiskers plot, etc. can reveal ways the summaries can fool you. I’ve seen these as well, especially variables with outliers or that are bi- or tri-modal.

One of the most famous examples of this effect is Anscombe’s Quartet. I’m including the Wikipedia image of the plots here:

More Read

Big Data Analytics: The Four Pillars
Stat Models, Astronomical Mysteries…and Business Data
BI Advice for Midsize Organizations: Keep It Simple
Quick Visualization of irs.gov Search Queries
The Socialization of Data Analytics

All four datasets have the same mean x values, y values, x standard deviation, y standard deviation, x-y pearson correlation coefficient, and regression line of y, so the summaries don’t tell the differences in the data.

I use correlations a lot to get the gist of the relationships in the data, and I’ve seen how correlations can deceive. In one project, we had 30K data points with a correlation of 0.9+. When we removed just 100 of these data points (the largest magnitudes of x and y), the correlation shrunk to 0.23.

Most data mining software has ways to visualize data easily now. Avail yourself to them to avoid subsequent surprises in your data.

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

mobile device farm
How Mobile Device Farms Strengthen Big Data Workflows
Big Data Exclusive
composable analytics
How Composable Analytics Unlocks Modular Agility for Data Teams
Analytics Big Data Exclusive
fintech startups
Why Fintech Start-Ups Struggle To Secure The Funding They Need
Infographic News
edge networks in manufacturing
Edge Infrastructure Strategies for Data-Driven Manufacturers
Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

#9: Here’s a thought…

7 Min Read

Why Big Data and Business Intelligence Are Like One Direction

14 Min Read
Image
AnalyticsBest PracticesBig DataCommentaryCulture/LeadershipData VisualizationData WarehousingExclusiveHadoopHardwareOpen SourcePredictive AnalyticsRisk Management

Beware of Big Data Technology Zealotry

5 Min Read

Foreign languages and data streams

4 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence
ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?