Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    How a Specialized Marketing VA Improves Campaign Analytics
    How a Specialized Marketing VA Improves Campaign Analytics
    11 Min Read
    New Data Analytics Breakthroughs Give eCommerce Startups a Fighting Chance
    New Data Analytics Breakthroughs Give eCommerce Startups a Fighting Chance
    6 Min Read
    How Data Analytics Is Reshaping Patient Financing Decisions
    How Data Analytics Is Reshaping Patient Financing Decisions
    13 Min Read
    business using business intelligence
    How to Use a Competitive Intelligence Dashboard to Turn Market Data Into Smarter Marketing Decisions 
    9 Min Read
    unusual trading activity
    Signal Or Noise? A Decision Tree For Evaluating Unusual Trading Activity
    3 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: There Are 2 Ways To Make Large Datasets Useful…
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Quality > There Are 2 Ways To Make Large Datasets Useful…
CommentaryData Quality

There Are 2 Ways To Make Large Datasets Useful…

ChrisDixon
ChrisDixon
3 Min Read
SHARE

I’ve spent the majority of my career building technologies that try to do useful things with large datasets.*

One of the most important lessons I’ve learned is that there are only two ways to make useful products out of large data sets. Algorithms that deal with large data sets tend to be accurate at best 80%-90% of the time (an old “joke” about machine learning is that it’s really good at partially solving any problem).

I’ve spent the majority of my career building technologies that try to do useful things with large datasets.*

More Read

A Taleo Tale: Oracle Reaches for Cloud Credibility
Because it’s Friday: Please don’t write like a scientist
How Human Centered Design and Big Data Are Merging in 2017
The Secrets to Big Data and Information Optimization Revealed in 2013 Research Agenda
How to Scale Your Big Data Project Effectively

One of the most important lessons I’ve learned is that there are only two ways to make useful products out of large data sets. Algorithms that deal with large data sets tend to be accurate at best 80%-90% of the time (an old “joke” about machine learning is that it’s really good at partially solving any problem).

Consequently, you either need to accept you’ll have some errors but deploy the system in a fault-tolerant context, or you need to figure out how to get the remaining accuracy through manual labor.

What do I mean by fault-tolerant context? If a search engine shows the most relevant result as the 2nd or 3rd result, users are still pretty happy. The same goes for recommendation systems that show multiple results (e.g. Netflix). Trading systems that hedge funds use are also often fault tolerant: if you make money 80% of the time and lose it 20% of the time, you can still usually have a profitable system.

For fault-intolerant contexts, you need to figure out how to scalably and cost-effectively produce the remaining accuracy through manual labor. When we were building SiteAdvisor, we knew that any inaccuracies would be a big problem: incorrectly rating a website as unsafe hurts the website, and incorrectly rating a website as safe hurts the user.

Because we knew automation would only get us 80-90% accuracy, we built 1) systems to estimate confidence levels in our ratings so we would know what to manually review, and 2) a workflow system so that our staff, an offshore team we hired, and users could flag or fix inaccuracies.

* My first job was as a programmer at a hedge fund, where we built systems that analyzed large data sets to trade stock options. Later, I cofounded SiteAdvisor where the goal was to build a system to assign security safety ratings to tens of millions of websites. Then I cofounded Hunch, which was acquired by eBay – we are now working on new recommendation technologies for ebay.com and other eBay websites.

TAGGED:datasets
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

How a Specialized Marketing VA Improves Campaign Analytics
How a Specialized Marketing VA Improves Campaign Analytics
Analytics Exclusive
ai marketing tools
The 9 AI Tools Marketers Use to Create Images and Video in 2026
Artificial Intelligence Exclusive
ai chatbot
How AI Website Chatbots Improve Customer Support and Lead Generation
Chatbots Exclusive
Why Every Small Business Should Care About an AI Image Generator
Why Every Small Business Should Care About an AI Image Generator
Artificial Intelligence Exclusive

Stay Connected

1.2KFollowersLike
33.7KFollowersFollow
222FollowersPin

You Might also Like

data science and python
Big DataBusiness Intelligence

Why Choosing Python For Data Science Is An Important Move

11 Min Read
Diverse Research Datasets
Big DataExclusive

The 5 Best Platforms Offering the Most Diverse Research Datasets in 2026

10 Min Read
customer data protection
Data ManagementExclusivePolicy and GovernancePrivacyRisk Management

Here Are The Most Important Ways To Ensure Customer Data Protection

8 Min Read
decentralized ai training
Artificial IntelligenceExclusive

Decentralized AI Training: 4 Leading Dataset Solutions For Your Business

10 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots
ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-26 SmartData Collective. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?