Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    sales and data analytics
    How Data Analytics Improves Lead Management and Sales Results
    9 Min Read
    data analytics and truck accident claims
    How Data Analytics Reduces Truck Accidents and Speeds Up Claims
    7 Min Read
    predictive analytics for interior designers
    Interior Designers Boost Profits with Predictive Analytics
    8 Min Read
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Keeping Your Big Data Analysis Clean
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Keeping Your Big Data Analysis Clean
Big Data

Keeping Your Big Data Analysis Clean

Rick Delgado
Rick Delgado
5 Min Read
SHARE

‘Outlier’ is a term that comes from statistics and data analytics. Math.com defines an outlier as “a value that lies outside (is much smaller or larger than) most of the other values in a set of data,” and it gives a sample of values for an example. If you have the values 25, 29, 3, 32, 85, 33, 27, and 28, both 3 and 85 are your outliers.”

‘Outlier’ is a term that comes from statistics and data analytics. Math.com defines an outlier as “a value that lies outside (is much smaller or larger than) most of the other values in a set of data,” and it gives a sample of values for an example. If you have the values 25, 29, 3, 32, 85, 33, 27, and 28, both 3 and 85 are your outliers.”

If you’re looking to become an outlier, and/or if you’re using a big data platform to optimize your entire business model—looking for outliers to either weed out or leverage—then it’s important to understand where outliers come from and how instructive or beneficial they are to your particular data set. Above all else, you must learn to recognize whether any outliers that crop up in your data analysis are the result of flaws in your analytics model, or if they’re anomalies particular to your specific business, and whether they’re something to be eliminated or enhanced. That level of understanding begins and ends with keeping your big data analysis clean. Here are a few tips for that.

More Read

big data security protection
Big Data: A Potential Opportunity And A Risk For Companies
Data Mining Interview: Guillaume Main
Market Research and Big Data: A Difficult Relationship
Experts Debate The Cost Of Big Data Web Application Development
Using Data Annotations for Quality Control Purposes
  1. Investigate and identify the cause — Not all outliers are the result of errors. They may be exactly what you’re looking for. However, sometimes outliers come from a transcription error or malfunctioning equipment that is reporting inaccurate values. Extreme outliers like these can negatively impact the accuracy of your analysis. So, you’ll want to either remove these values from the data set or fix the flaws causing them.

  1. Use data visualization tools — Data visualization tools make looking at trends and patterns in a large data set much easier than just looking at the numbers. Seeing anomalies is the first step to understanding them.

  1. Know the factors that may skew your data — In a bar full of average people plus Bill Gates, a measurement on the average income in the room would be skewed by the presence of Gates. French census data taken the year Napoleon Bonaparte was born would show nothing out of the ordinary, and yet, how much did he impact European census data throughout the nineteenth century? These examples show how outliers can heavily influence average values.

  1. Be agnostic about your outliers — In and of themselves, outliers are neither good nor bad. They are simply extreme values that may or may not be expected. Most of all, outliers are instructive. They represent risks, opportunities, mistakes, anomalies, or something else. Their usefulness is a product of context and how that relates to a company’s goals.

  1. Check your assumptions at the door — Assumptions about your data will mislead you and create biases that impact the outcome of your data analysis. It’s very common for people to overlook their underlying assumptions and biases. Try to keep an open mind about what the data tells you and try to look for alternate interpretations where possible. Sometimes the idea that data analysis will reveal a problem and point toward an eventual solution is an assumption itself.

Ironically most, if not all, businesses are applying big data platforms toward their ultimate goal of becoming outliers in their industry. That is to say, whatever set of variable factors influence the definition of the term, ‘outlier,’ in a given business landscape, whether it’s market share, gross revenue, stock prices, longevity, or some combination, to be the absolute best is to be the outlier. Today’s big data platforms are helping businesses to create powerful models for tracking and measuring trends, behaviors, and markets, but the results will only be as good as the analytical model. To become the outlier, you must first understand your own outliers.

Share This Article
Facebook Pinterest LinkedIn
Share
ByRick Delgado
Follow:
All things Big Data, Tech commentator, Enterprise Trends and every once in a while I write for @dell.

Follow us on Facebook

Latest News

sales and data analytics
How Data Analytics Improves Lead Management and Sales Results
Analytics Big Data Exclusive
ai in marketing
How AI and Smart Platforms Improve Email Marketing
Artificial Intelligence Exclusive Marketing
AI Document Verification for Legal Firms: Importance & Top Tools
AI Document Verification for Legal Firms: Importance & Top Tools
Artificial Intelligence Exclusive
AI supply chain
AI Tools Are Strengthening Global Supply Chains
Artificial Intelligence Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Accuracy

19 Min Read

BI for Baby: Dashboards to Track Your Child’s KPIs (And The Rest of Us)

2 Min Read
Newspaper Industry
AnalyticsBig Data

Is Big Data the Salvation of the Newspaper Industry?

5 Min Read

Boston TDWI Chapter Meeting (updated agenda)

2 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots
giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?