Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    big data analytics in transporation
    Turning Data Into Decisions: How Analytics Improves Transportation Strategy
    3 Min Read
    sales and data analytics
    How Data Analytics Improves Lead Management and Sales Results
    9 Min Read
    data analytics and truck accident claims
    How Data Analytics Reduces Truck Accidents and Speeds Up Claims
    7 Min Read
    predictive analytics for interior designers
    Interior Designers Boost Profits with Predictive Analytics
    8 Min Read
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Keeping Your Big Data Analysis Clean
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Keeping Your Big Data Analysis Clean
Big Data

Keeping Your Big Data Analysis Clean

Rick Delgado
Rick Delgado
5 Min Read
SHARE

‘Outlier’ is a term that comes from statistics and data analytics. Math.com defines an outlier as “a value that lies outside (is much smaller or larger than) most of the other values in a set of data,” and it gives a sample of values for an example. If you have the values 25, 29, 3, 32, 85, 33, 27, and 28, both 3 and 85 are your outliers.”

‘Outlier’ is a term that comes from statistics and data analytics. Math.com defines an outlier as “a value that lies outside (is much smaller or larger than) most of the other values in a set of data,” and it gives a sample of values for an example. If you have the values 25, 29, 3, 32, 85, 33, 27, and 28, both 3 and 85 are your outliers.”

If you’re looking to become an outlier, and/or if you’re using a big data platform to optimize your entire business model—looking for outliers to either weed out or leverage—then it’s important to understand where outliers come from and how instructive or beneficial they are to your particular data set. Above all else, you must learn to recognize whether any outliers that crop up in your data analysis are the result of flaws in your analytics model, or if they’re anomalies particular to your specific business, and whether they’re something to be eliminated or enhanced. That level of understanding begins and ends with keeping your big data analysis clean. Here are a few tips for that.

More Read

How to Position your Database Start Up
By collecting previous crime statistics and external factors —…
Log Analytics Practices That DevOps Experts Must Embrace In 2019
Democratizing Data with Decision Management
Decision Management, Tom Davenport and the New BI
  1. Investigate and identify the cause — Not all outliers are the result of errors. They may be exactly what you’re looking for. However, sometimes outliers come from a transcription error or malfunctioning equipment that is reporting inaccurate values. Extreme outliers like these can negatively impact the accuracy of your analysis. So, you’ll want to either remove these values from the data set or fix the flaws causing them.

  1. Use data visualization tools — Data visualization tools make looking at trends and patterns in a large data set much easier than just looking at the numbers. Seeing anomalies is the first step to understanding them.

  1. Know the factors that may skew your data — In a bar full of average people plus Bill Gates, a measurement on the average income in the room would be skewed by the presence of Gates. French census data taken the year Napoleon Bonaparte was born would show nothing out of the ordinary, and yet, how much did he impact European census data throughout the nineteenth century? These examples show how outliers can heavily influence average values.

  1. Be agnostic about your outliers — In and of themselves, outliers are neither good nor bad. They are simply extreme values that may or may not be expected. Most of all, outliers are instructive. They represent risks, opportunities, mistakes, anomalies, or something else. Their usefulness is a product of context and how that relates to a company’s goals.

  1. Check your assumptions at the door — Assumptions about your data will mislead you and create biases that impact the outcome of your data analysis. It’s very common for people to overlook their underlying assumptions and biases. Try to keep an open mind about what the data tells you and try to look for alternate interpretations where possible. Sometimes the idea that data analysis will reveal a problem and point toward an eventual solution is an assumption itself.

Ironically most, if not all, businesses are applying big data platforms toward their ultimate goal of becoming outliers in their industry. That is to say, whatever set of variable factors influence the definition of the term, ‘outlier,’ in a given business landscape, whether it’s market share, gross revenue, stock prices, longevity, or some combination, to be the absolute best is to be the outlier. Today’s big data platforms are helping businesses to create powerful models for tracking and measuring trends, behaviors, and markets, but the results will only be as good as the analytical model. To become the outlier, you must first understand your own outliers.

Share This Article
Facebook Pinterest LinkedIn
Share
ByRick Delgado
Follow:
All things Big Data, Tech commentator, Enterprise Trends and every once in a while I write for @dell.

Follow us on Facebook

Latest News

big data analytics in transporation
Turning Data Into Decisions: How Analytics Improves Transportation Strategy
Analytics Big Data Exclusive
AI and fund manager software
AI And The Acceleration Of Information Flows From Fund Managers To Investors
Artificial Intelligence Exclusive
sales and data analytics
How Data Analytics Improves Lead Management and Sales Results
Analytics Big Data Exclusive
ai in marketing
How AI and Smart Platforms Improve Email Marketing
Artificial Intelligence Exclusive Marketing

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Agriculture Industry
AnalyticsBig Data

Big Data Revolution in Agriculture Industry: Opportunities and Challenges

5 Min Read

Beyond Predictive BI

5 Min Read

Why Big Data Will Power the World [VIDEO]

1 Min Read
analyzing big data for its quality and value
Big Data

Use this Strategic Approach to Maximize Your Data’s Value

6 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence
data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?