Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
    pexels pavel danilyuk 8112119
    Data Analytics Is Revolutionizing Medical Credentialing
    8 Min Read
    data and seo
    Maximize SEO Success with Powerful Data Analytics Insights
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: When Big Data Can’t Predict
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Analytics > Predictive Analytics > When Big Data Can’t Predict
AnalyticsPredictive Analytics

When Big Data Can’t Predict

BillFranks
BillFranks
7 Min Read
Image
SHARE

Image

Contents
Scenario 1: Big Data, Small UniverseScenario 2: Big Data, Big Universe, Incredibly Rare EventsDon’t Despair, Prepare

Image

Most people think that in the age of big data, we always have more than enough information to build robust analytics. Unfortunately, this isn’t always the case. In fact, there are situations where even massive amounts of data still don’t enable even basic predictions to be made with confidence. In many cases, there isn’t much that can be done other than to recognize the facts and stick to the basics instead of getting fancy. This challenge of big data that can’t be used to predict seems like an impossible paradox at first, but let’s explore why it isn’t.

More Read

customer segmentation analysis for email
How Big Data Improves Customer Segmentation Analysis For Email Marketers
Revealing Human Nature through Social Media Measurement
4 Steps To Getting Started With Big Data For Small Businesses
Information used to be a river, flowing in one predictable…
Why I’m Teaching Twitch to Predict the Future

Scenario 1: Big Data, Small Universe

One example where issues arise is when we have a ton of data on a very small population. This makes it tough to find meaningful patterns. Let’s think about an airline manufacturer. Today’s airplanes generate terabytes of data every hour of operation. There are a lot of benefits that can come out of analyzing that data in terms of understanding things like how the engines are operating under differing conditions. However, at the same time, some exciting analytics like predictive maintenance can be difficult. Why is that?

Realize that even the biggest aircraft manufacturers only put out a few hundred airplanes per year. By the time the different models are taken into account, perhaps only a couple dozen of some models are produced in any given year. Even if the aircraft come fully loaded with sensors throughout, it will be hard to develop meaningful predictive part failure models. Why? Because with only a few dozen or hundred aircraft, the sample is too small.

This is exacerbated by the low failure rate of things like an engine (or engine component), especially on a new aircraft. So, while petabytes of data might be collected over a couple years of operation, there simply may not be enough aircraft to create a large enough pool of good and bad events from which to build predictive models that really work. Certainly, we can monitor the data to look for anomalous patterns that might support an investigation or intervention. But, that’s not a predictive model.

Scenario 2: Big Data, Big Universe, Incredibly Rare Events

There are other situations where there is a large universe of people or things to analyze and lots of data about them all. However, when events are exceedingly rare, you can still end up with a situation where there just aren’t enough exceptions to build truly effective predictive models. Again, this isn’t to say that there isn’t a lot of value in analyzing the data and understanding various aspects of the behavior of the people or things. It is simply saying that it may not be possible to build effective predictive models.

Let’s consider computer chips. Many millions, if not billions, of chips are produced each year and the rate is ever increasing. Decades ago, defects on the order of one in 10,000 or one in 100,000 might have been acceptable. With today’s chip-infused products, defects need to be closer to the one in millions level. I’ve had clients mention that there is pressure from the auto industry to drive chip defect rates down to one in a billion or less. Why is that?

The answer is that if any given new car has 1,000 chips in it in a few years, even small error rates start to translate into a lot of defective vehicles. With defect rates of one in 1,000,000 then about one of every thousand cars produced would have at least one critical defect. That translates to a lot of cost. It can also lead to lost lives if a chip fails in an autonomous vehicle and therefore causes it to malfunction while in operation. Hence, the push for incredibly low defect rates.

The issue becomes that if such low error rates are achieved, and if we can assume that there are a wide range of issues that could lead to a defective chip, there will be so few instances of any given defect happening for any specific set of reasons that we may never have enough of a sample to enable a good model to be produced to predict when and where those failures might occur. Considering chips are outdated and replaced with newer models within just a few years, it is quite plausible that this can be on ongoing issue.

Don’t Despair, Prepare

Keep in mind that the issues I’ve raised here are not the rule, but the exception. However, as data is collected from more and more sources and we analyze more and more aspects of our businesses, these exceptions are almost certain to pop up within your organization now and then. The important thing to do is simply to be on the lookout for cases where you have a very small universe to analyze, an incredibly rare event to analyze, or, worst of all, a rare event within a small universe. I am assuming, naturally, that you are only considering situations where the data is relevant to your business problem. Data that isn’t relevant will never add value no matter how big or small.

When occasions arise where you’re uncertain your data is going to be effective for prediction, make sure you assess what will plausibly be possible before investing too much energy into developing sophisticated analytics on the data. You may have to settle for basic analytics in some cases. It is important to keep in mind, however, that you should still be better off than if you had no data at all to analyze. That’s the upside to keep in mind instead of letting frustration get the best of you.


Share This Article
Facebook Pinterest LinkedIn
Share
ByBillFranks
Follow:
Bill Franks is Chief Analytics Officer for The International Institute For Analytics (IIA). Franks is also the author of Taming The Big Data Tidal Wave and The Analytics Revolution. His work has spanned clients in a variety of industries for companies ranging in size from Fortune 100 companies to small non-profit organizations. You can learn more at http://www.bill-franks.com.

Follow us on Facebook

Latest News

image fx (2)
Monitoring Data Without Turning into Big Brother
Big Data Exclusive
image fx (71)
The Power of AI for Personalization in Email
Artificial Intelligence Exclusive Marketing
image fx (67)
Improving LinkedIn Ad Strategies with Data Analytics
Analytics Big Data Exclusive Software
big data and remote work
Data Helps Speech-Language Pathologists Deliver Better Results
Analytics Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

This project brings together researchers from seven disciplines…

2 Min Read
Image
AnalyticsBig DataBusiness IntelligenceCloud ComputingData ManagementData MiningData WarehousingExclusiveHadoopITPredictive AnalyticsUnstructured DataWorkforce Data

Revealed: The Top 5 Big Data Use Cases Your CEO Will Love

9 Min Read
business intelligence
AnalyticsBest PracticesBig DataBusiness IntelligenceCulture/LeadershipData ManagementMobility

Five BI and Analytics Takeaways from Gartner Summit 2013

5 Min Read

Data Analysts, Data Scientists, and the Rest of Us

6 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data
ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?