Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    composable analytics
    How Composable Analytics Unlocks Modular Agility for Data Teams
    9 Min Read
    data mining to find the right poly bag makers
    Using Data Analytics to Choose the Best Poly Mailer Bags
    12 Min Read
    data analytics for pharmacy trends
    How Data Analytics Is Tracking Trends in the Pharmacy Industry
    5 Min Read
    car expense data analytics
    Data Analytics for Smarter Vehicle Expense Management
    10 Min Read
    image fx (60)
    Data Analytics Driving the Modern E-commerce Warehouse
    13 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Keeping Your Big Data Analysis Clean
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Keeping Your Big Data Analysis Clean
Big Data

Keeping Your Big Data Analysis Clean

Rick Delgado
Rick Delgado
5 Min Read
SHARE

‘Outlier’ is a term that comes from statistics and data analytics. Math.com defines an outlier as “a value that lies outside (is much smaller or larger than) most of the other values in a set of data,” and it gives a sample of values for an example. If you have the values 25, 29, 3, 32, 85, 33, 27, and 28, both 3 and 85 are your outliers.”

‘Outlier’ is a term that comes from statistics and data analytics. Math.com defines an outlier as “a value that lies outside (is much smaller or larger than) most of the other values in a set of data,” and it gives a sample of values for an example. If you have the values 25, 29, 3, 32, 85, 33, 27, and 28, both 3 and 85 are your outliers.”

If you’re looking to become an outlier, and/or if you’re using a big data platform to optimize your entire business model—looking for outliers to either weed out or leverage—then it’s important to understand where outliers come from and how instructive or beneficial they are to your particular data set. Above all else, you must learn to recognize whether any outliers that crop up in your data analysis are the result of flaws in your analytics model, or if they’re anomalies particular to your specific business, and whether they’re something to be eliminated or enhanced. That level of understanding begins and ends with keeping your big data analysis clean. Here are a few tips for that.

More Read

How To Lose Credibility When Using Data
The Personas that Matter the Most in Business Analytics
Amazon Extends SimpleDB
Impressive Benefits of UTM Codes Derived from Big Data
Data-Driven Approaches for Email Marketing Automation in Your Business
  1. Investigate and identify the cause — Not all outliers are the result of errors. They may be exactly what you’re looking for. However, sometimes outliers come from a transcription error or malfunctioning equipment that is reporting inaccurate values. Extreme outliers like these can negatively impact the accuracy of your analysis. So, you’ll want to either remove these values from the data set or fix the flaws causing them.

  1. Use data visualization tools — Data visualization tools make looking at trends and patterns in a large data set much easier than just looking at the numbers. Seeing anomalies is the first step to understanding them.

  1. Know the factors that may skew your data — In a bar full of average people plus Bill Gates, a measurement on the average income in the room would be skewed by the presence of Gates. French census data taken the year Napoleon Bonaparte was born would show nothing out of the ordinary, and yet, how much did he impact European census data throughout the nineteenth century? These examples show how outliers can heavily influence average values.

  1. Be agnostic about your outliers — In and of themselves, outliers are neither good nor bad. They are simply extreme values that may or may not be expected. Most of all, outliers are instructive. They represent risks, opportunities, mistakes, anomalies, or something else. Their usefulness is a product of context and how that relates to a company’s goals.

  1. Check your assumptions at the door — Assumptions about your data will mislead you and create biases that impact the outcome of your data analysis. It’s very common for people to overlook their underlying assumptions and biases. Try to keep an open mind about what the data tells you and try to look for alternate interpretations where possible. Sometimes the idea that data analysis will reveal a problem and point toward an eventual solution is an assumption itself.

Ironically most, if not all, businesses are applying big data platforms toward their ultimate goal of becoming outliers in their industry. That is to say, whatever set of variable factors influence the definition of the term, ‘outlier,’ in a given business landscape, whether it’s market share, gross revenue, stock prices, longevity, or some combination, to be the absolute best is to be the outlier. Today’s big data platforms are helping businesses to create powerful models for tracking and measuring trends, behaviors, and markets, but the results will only be as good as the analytical model. To become the outlier, you must first understand your own outliers.

Share This Article
Facebook Pinterest LinkedIn
Share
ByRick Delgado
Follow:
All things Big Data, Tech commentator, Enterprise Trends and every once in a while I write for @dell.

Follow us on Facebook

Latest News

mobile device farm
How Mobile Device Farms Strengthen Big Data Workflows
Big Data Exclusive
composable analytics
How Composable Analytics Unlocks Modular Agility for Data Teams
Analytics Big Data Exclusive
fintech startups
Why Fintech Start-Ups Struggle To Secure The Funding They Need
Infographic News
edge networks in manufacturing
Edge Infrastructure Strategies for Data-Driven Manufacturers
Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Bad Data Mistakes
Big DataData Quality

The Lessons We can Learn from Bad Data Mistakes Made Throughout History

7 Min Read
cloud security to protect your data
Best PracticesBig DataCloud ComputingData ManagementITPrivacyRisk ManagementSecurity

Cloud Security: Practical And Effective Ways To Protect Your Data

5 Min Read
Image
Big DataHadoopSoftware

The 4 Key Pillars of Hadoop Performance and Scalability

6 Min Read
Image
AnalyticsBusiness IntelligenceCloud ComputingData MiningData QualityData VisualizationData WarehousingDecision ManagementExclusiveHadoopMapReduceMarket ResearchOpen SourceSocial DataSQLUnstructured Data

Spotlight on SiSense: BI Without the Bandwidth

6 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots
ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?