Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics and truck accident claims
    How Data Analytics Reduces Truck Accidents and Speeds Up Claims
    7 Min Read
    predictive analytics for interior designers
    Interior Designers Boost Profits with Predictive Analytics
    8 Min Read
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Dr Gates was right, or how I learned to stop worrying and love the spam
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > Dr Gates was right, or how I learned to stop worrying and love the spam
Data MiningPredictive Analytics

Dr Gates was right, or how I learned to stop worrying and love the spam

DavidMSmith
DavidMSmith
6 Min Read
SHARE

In 2004 Microsoft founder (and honorary doctorate recipient) Bill Gates confidently stated that “Spam will soon be a thing of the past.” It’s now five years later (Gates suggested the problem would be solved in two), and spam is now 95% of all emails sent. Nonetheless, I think Gates was mostly right in principle even if the timeline was optimistic. A decade ago, when email spam was a real problem, I took care not to let my email address be displayed in public. Spammers had a habit of scraping email addresses from web-sites, with automated robots crawling the web looking…

In 2004 Microsoft founder (and honorary doctorate recipient) Bill Gates confidently stated that "Spam will soon be a thing of the past." It's now five years later (Gates suggested the problem would be solved in two), and spam is now 95% of all emails sent. Nonetheless, I think Gates was mostly right in principle even if the timeline was optimistic.

A decade ago, when email spam was a real problem, I took care not to let my email address be displayed in public. Spammers had a habit of scraping email addresses from web-sites, with automated robots crawling the web looking for any text containing the @-symbol. Despite my efforts, I had to abandon a couple of email addresses after they got added to the mailing lists traded between spammers, and the noise overwhelmed the signal in my inbox.

More Read

HealthMiner, an application that analyzes patient data, was…
Big Data and Rise of Predictive Analytics
Gartner says predictive analytics are the hot BI topic.
We’re Not Artists: The Craft of Influencing Decision Makers
5 Rules for Better Sales Analytics
That was before the advent of good spam filters, though, which have greatly improved in the last couple of years. I now use Google Mail for all my mail, which has excellent spam-filtering technology. Even my non-Google addresses are forwarded to a gmail account, which I can rely on to filter the crap so that I can see the emails I actually care about.

I started my current job about 9 months ago now, and I made a conscious decision to stop worrying about spam and let my email address — david@revolution-computing.com — be free. It's linked directly on every page of this blog and on the REvolution Computing website, and I don't hesitate to include it in other public venues. It's been out there long enough to be picked up by robots and web searches, so it's probably time to evaluate the results. I'd say it's a success, and I'm very glad I took the plunge. I maybe get 2 spam emails a week in my Google Mail account (faithfully tucked away in my Spam folder), and better yet I don't think I've lost any legitimate mail to the spam filter. (So if you've emailed me and I haven't replied, I have only myself to blame. My apologies – I do get a lot of legitimate email.) I don't use any other email services so I can't speak to the performance of their spam filters, but I'm happy with my results.

So what changed between 2004 and now? My guess is that it's mainly been the transition to web-based email services. Statisticians have attempted to solve the spam problem before with predictive models, but the results were never that great at the time. The problem was likely twofold: it's a highly asymmetrical problem, where a false positive is a much bigger problem than a false negative, but too many false negatives mean the filter isn't really useful in practice. Secondly, I think the corpus was simply too small: a few hundred thousand emails, or even all the emails for all the employees of a largish company with a central email server, simply isn't going to result in a filter that gives a clean inbox while not trashing any legitimate mail sent to a broad community of users.

Web-based email certainly solves the corpus-size problem, but there's one additional detail that I expect makes it work. The defining feature of spam is that a spam email is sent to lots and lots of people and a web-based email service like Google Mail can easily see when a duplicate email is sent to lots and lots of users at the same time. Spammers have attempted various tricks to make that process more difficult — converting text to images, or adding random text to each mail to make it harder to detect duplicates — but Google seems to have largely overcome these hurdles.

So then, is the spam problem solved? At a technical level, clearly not — spam still consumes a tremendous amount of bandwidth and costs billions of dollars to contain — but at the personal level it's hardly more than a minor irritant these days. (And if it's not for you, consider a new email service.) For individuals, the real spam problem these days lies in other venues: social networking spam, blog spam, link farms, and so on. Mr Gates, when can we expect solutions to those problems? 

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

data analytics and truck accident claims
How Data Analytics Reduces Truck Accidents and Speeds Up Claims
Analytics Big Data Exclusive
predictive analytics for interior designers
Interior Designers Boost Profits with Predictive Analytics
Analytics Exclusive Predictive Analytics
big data and cybercrime
Stopping Lateral Movement in a Data-Heavy, Edge-First World
Big Data Exclusive
AI and data mining
What the Rise of AI Web Scrapers Means for Data Teams
Artificial Intelligence Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

IBM’s New Retail Tools How you shop: what it…

1 Min Read

Some interesting SAS-Teradata news

3 Min Read

Market Research in 3-D! – For Market Research, Social Networks Is to 2009 as what the Online Survey was in 1998

3 Min Read

Is a tweet worth a drink?

4 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence
data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?