By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics in sports industry
    Here’s How Data Analytics In Sports Is Changing The Game
    6 Min Read
    data analytics on nursing career
    Advances in Data Analytics Are Rapidly Transforming Nursing
    8 Min Read
    data analytics reveals the benefits of MBA
    Data Analytics Technology Proves Benefits of an MBA
    9 Min Read
    data-driven image seo
    Data Analytics Helps Marketers Substantially Boost Image SEO
    8 Min Read
    construction analytics
    5 Benefits of Analytics to Manage Commercial Construction
    5 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: There Are 2 Ways To Make Large Datasets Useful…
Share
Notification Show More
Latest News
big data mac performance
Data-Driven Tips to Optimize the Speed of Macs
News
3 Ways AI Has Helped Marketers and Creative Professionals Streamline Workflows
3 Ways AI Has Helped Marketers and Creative Professionals Streamline Workflows
Artificial Intelligence
data analytics in sports industry
Here’s How Data Analytics In Sports Is Changing The Game
Big Data
data analytics on nursing career
Advances in Data Analytics Are Rapidly Transforming Nursing
Analytics
data analytics reveals the benefits of MBA
Data Analytics Technology Proves Benefits of an MBA
Analytics
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Quality > There Are 2 Ways To Make Large Datasets Useful…
CommentaryData Quality

There Are 2 Ways To Make Large Datasets Useful…

ChrisDixon
Last updated: 2012/04/15 at 6:14 AM
ChrisDixon
3 Min Read
SHARE

I’ve spent the majority of my career building technologies that try to do useful things with large datasets.*

One of the most important lessons I’ve learned is that there are only two ways to make useful products out of large data sets. Algorithms that deal with large data sets tend to be accurate at best 80%-90% of the time (an old “joke” about machine learning is that it’s really good at partially solving any problem).

I’ve spent the majority of my career building technologies that try to do useful things with large datasets.*

More Read

data science and python

Why Choosing Python For Data Science Is An Important Move

Here Are The Most Important Ways To Ensure Customer Data Protection
Big Data and the SME: Prepare to Succeed
Mitigating Bias in Machine Learning Datasets
What’s in Data.gov? A recent article by Tim Berners-Lee,…

One of the most important lessons I’ve learned is that there are only two ways to make useful products out of large data sets. Algorithms that deal with large data sets tend to be accurate at best 80%-90% of the time (an old “joke” about machine learning is that it’s really good at partially solving any problem).

Consequently, you either need to accept you’ll have some errors but deploy the system in a fault-tolerant context, or you need to figure out how to get the remaining accuracy through manual labor.

What do I mean by fault-tolerant context? If a search engine shows the most relevant result as the 2nd or 3rd result, users are still pretty happy. The same goes for recommendation systems that show multiple results (e.g. Netflix). Trading systems that hedge funds use are also often fault tolerant: if you make money 80% of the time and lose it 20% of the time, you can still usually have a profitable system.

For fault-intolerant contexts, you need to figure out how to scalably and cost-effectively produce the remaining accuracy through manual labor. When we were building SiteAdvisor, we knew that any inaccuracies would be a big problem: incorrectly rating a website as unsafe hurts the website, and incorrectly rating a website as safe hurts the user.

Because we knew automation would only get us 80-90% accuracy, we built 1) systems to estimate confidence levels in our ratings so we would know what to manually review, and 2) a workflow system so that our staff, an offshore team we hired, and users could flag or fix inaccuracies.

* My first job was as a programmer at a hedge fund, where we built systems that analyzed large data sets to trade stock options. Later, I cofounded SiteAdvisor where the goal was to build a system to assign security safety ratings to tens of millions of websites. Then I cofounded Hunch, which was acquired by eBay – we are now working on new recommendation technologies for ebay.com and other eBay websites.

TAGGED: datasets
ChrisDixon April 15, 2012
Share this Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

big data mac performance
Data-Driven Tips to Optimize the Speed of Macs
News
3 Ways AI Has Helped Marketers and Creative Professionals Streamline Workflows
3 Ways AI Has Helped Marketers and Creative Professionals Streamline Workflows
Artificial Intelligence
data analytics in sports industry
Here’s How Data Analytics In Sports Is Changing The Game
Big Data
data analytics on nursing career
Advances in Data Analytics Are Rapidly Transforming Nursing
Analytics

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

data science and python
Big DataBusiness Intelligence

Why Choosing Python For Data Science Is An Important Move

11 Min Read
customer data protection
Data ManagementExclusivePolicy and GovernancePrivacyRisk Management

Here Are The Most Important Ways To Ensure Customer Data Protection

8 Min Read
big data and SMEs
Big DataBusiness IntelligenceExclusiveKnowledge Management

Big Data and the SME: Prepare to Succeed

5 Min Read
machine learning
Big DataExclusiveMachine Learning

Mitigating Bias in Machine Learning Datasets

7 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence
AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?