Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    software developer using ai
    How Data Analytics Helps Developers Deliver Better Tech Services
    8 Min Read
    ai for stock trading
    Can Data Analytics Help Investors Outperform Warren Buffett
    9 Min Read
    media monitoring
    Signals In The Noise: Using Media Monitoring To Manage Negative Publicity
    5 Min Read
    data analytics
    How Data Analytics Can Help You Construct A Financial Weather Map
    4 Min Read
    financial analytics
    Financial Analytics Shows The Hidden Cost Of Not Switching Systems
    4 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Dr Gates was right, or how I learned to stop worrying and love the spam
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > Dr Gates was right, or how I learned to stop worrying and love the spam
Data MiningPredictive Analytics

Dr Gates was right, or how I learned to stop worrying and love the spam

DavidMSmith
DavidMSmith
6 Min Read
SHARE

In 2004 Microsoft founder (and honorary doctorate recipient) Bill Gates confidently stated that “Spam will soon be a thing of the past.” It’s now five years later (Gates suggested the problem would be solved in two), and spam is now 95% of all emails sent. Nonetheless, I think Gates was mostly right in principle even if the timeline was optimistic. A decade ago, when email spam was a real problem, I took care not to let my email address be displayed in public. Spammers had a habit of scraping email addresses from web-sites, with automated robots crawling the web looking…

In 2004 Microsoft founder (and honorary doctorate recipient) Bill Gates confidently stated that "Spam will soon be a thing of the past." It's now five years later (Gates suggested the problem would be solved in two), and spam is now 95% of all emails sent. Nonetheless, I think Gates was mostly right in principle even if the timeline was optimistic.

A decade ago, when email spam was a real problem, I took care not to let my email address be displayed in public. Spammers had a habit of scraping email addresses from web-sites, with automated robots crawling the web looking for any text containing the @-symbol. Despite my efforts, I had to abandon a couple of email addresses after they got added to the mailing lists traded between spammers, and the noise overwhelmed the signal in my inbox.

More Read

Ethereum cryptocurrency
Is Predictive Analytics Setting The Stage For An Ethereum Price Increase?
Dynamic IT
Earthquake Prediction Through Sunspots Part II: common Data Mining Mistakes!
Comparing the Cost Continued…
Analytics is Not a Dirty Word
That was before the advent of good spam filters, though, which have greatly improved in the last couple of years. I now use Google Mail for all my mail, which has excellent spam-filtering technology. Even my non-Google addresses are forwarded to a gmail account, which I can rely on to filter the crap so that I can see the emails I actually care about.

I started my current job about 9 months ago now, and I made a conscious decision to stop worrying about spam and let my email address — david@revolution-computing.com — be free. It's linked directly on every page of this blog and on the REvolution Computing website, and I don't hesitate to include it in other public venues. It's been out there long enough to be picked up by robots and web searches, so it's probably time to evaluate the results. I'd say it's a success, and I'm very glad I took the plunge. I maybe get 2 spam emails a week in my Google Mail account (faithfully tucked away in my Spam folder), and better yet I don't think I've lost any legitimate mail to the spam filter. (So if you've emailed me and I haven't replied, I have only myself to blame. My apologies – I do get a lot of legitimate email.) I don't use any other email services so I can't speak to the performance of their spam filters, but I'm happy with my results.

So what changed between 2004 and now? My guess is that it's mainly been the transition to web-based email services. Statisticians have attempted to solve the spam problem before with predictive models, but the results were never that great at the time. The problem was likely twofold: it's a highly asymmetrical problem, where a false positive is a much bigger problem than a false negative, but too many false negatives mean the filter isn't really useful in practice. Secondly, I think the corpus was simply too small: a few hundred thousand emails, or even all the emails for all the employees of a largish company with a central email server, simply isn't going to result in a filter that gives a clean inbox while not trashing any legitimate mail sent to a broad community of users.

Web-based email certainly solves the corpus-size problem, but there's one additional detail that I expect makes it work. The defining feature of spam is that a spam email is sent to lots and lots of people and a web-based email service like Google Mail can easily see when a duplicate email is sent to lots and lots of users at the same time. Spammers have attempted various tricks to make that process more difficult — converting text to images, or adding random text to each mail to make it harder to detect duplicates — but Google seems to have largely overcome these hurdles.

So then, is the spam problem solved? At a technical level, clearly not — spam still consumes a tremendous amount of bandwidth and costs billions of dollars to contain — but at the personal level it's hardly more than a minor irritant these days. (And if it's not for you, consider a new email service.) For individuals, the real spam problem these days lies in other venues: social networking spam, blog spam, link farms, and so on. Mr Gates, when can we expect solutions to those problems? 

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

software developer using ai
How Data Analytics Helps Developers Deliver Better Tech Services
Analytics Big Data Exclusive
ai for stock trading
Can Data Analytics Help Investors Outperform Warren Buffett
Analytics Exclusive
data security issues with annotation outsourcing
Data Annotation Outsourcing and Risk Mitigation Strategies
Big Data Exclusive Security
NO-CODE
Breaking down SPARC Emulation Technology: Zero Code Re-write
Exclusive News Software

Stay Connected

1.2KFollowersLike
33.7KFollowersFollow
222FollowersPin

You Might also Like

Analytics Overkill: Dashboards, Analysis and Big Data in the US Election

5 Min Read

Google Paper on Parallel EM Algorithm using MapReduce

1 Min Read

Future Trends in Business Rules (with a little help from my friends)

7 Min Read

Finding, Locating, Discovering

6 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive
ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?