Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
    pexels pavel danilyuk 8112119
    Data Analytics Is Revolutionizing Medical Credentialing
    8 Min Read
    data and seo
    Maximize SEO Success with Powerful Data Analytics Insights
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: A Million Monkeys Demonstrate the Power of Hadoop
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > IT > Cloud Computing > A Million Monkeys Demonstrate the Power of Hadoop
AnalyticsCloud ComputingHadoop

A Million Monkeys Demonstrate the Power of Hadoop

AlexOlesker
AlexOlesker
0 Min Read
SHARE

There are many great use cases for Apache Hadoop, the open source framework for scalable, reliable, and distributed computing on commodity hardware built around

There are many great use cases for Apache Hadoop, the open source framework for scalable, reliable, and distributed computing on commodity hardware built around Hadoop Distributed File System and MapReduce, such as delivering search engine results, sequencing genomes, and indexing entire libraries of text, but the Million Monkeys Project by Jesse Anderson may be the easiest to understand and the most fun.

The project was inspired by the Infinite Monkey Theorem which, in the simplest and most popular terms, states that a million monkeys with a million typewriters will, by randomly hitting the keys, eventually recreate the works of Shakespeare. The idea is that, though at any given instance the chance of a monkey typing a sonnet is essentially zero, with infinite instances it becomes almost certain. Anderson wanted to try this for himself but he didn’t have a million monkeys, a million typewriters, and infinite time and resources, so instead he used his home computer, Amazon’s Elastic Compute Cloud, and Hadoop to achieve the same results.

Anderson first generated a million virtual monkeys on Amazon’s EC2, which were really pseudo random number generators that would provide strings of 9 random characters. Anderson had to find a very efficient and reliable pseudo random number generator because at that scale, creating the strings was one of the most computationally expensive steps in the process, and he eventually settled on  Sean Luke’s Mersenne Twister. Next, he compared the generated string to the entirety of Shakespeare’s work and, if he found the string anywhere, he would mark it in almost real time, creating what he calls “performance art with monkeys and computers.” Comparing a 9 character string with every continuous set of 9 letters in all of William Shakespeare’s 38 works is no small task, and Anderson used a Bloom Filter to reduce CPU usage by 20-30%. A Bloom Filter works by creating a hash of the monkey’s string and comparing it to a file with all of the hashes and offsets of Shakespeare. Since hashes are shorter and simpler than the strings, this goes much faster but, because more than one string can result in a given hash, just because the hashes match doesn’t mean the strings will. If a match is found, the strings are then compared character by character.

More Read

Data Warehouse “as a Service” – A Good Pick for Mid-Sized Companies
Decision Mangement is where CRM goes next
How to Get Started in the Cloud
Is Google BigQuery The Future Of Big Data Analytics?
The Role of Predictive Analytics in Forecasting using Business Intelligence

The project took 1.5 months, generated 7.5 trillion character groups, and checked them against 5.5 trillion (5,429,503,678,976) possible combinations.  The project was concluded on October 6 when the last work, The Taming Of The Shrew, was completed. Normally, such a massive task would be out of the reach of one man without a team of computer scientists and supercomputers, but because Hadoop was able to break the overwhelming job into little segments running in parallel on servers in Amazon’s cloud, Jesse Anderson managed to do it himself on commodity hardware. Though the Million Monkeys Project was mostly for fun, it shares many similarities to other serious use cases for Hadoop. DNA sequencing, for example, involves matching short reads of a few dozen pairs to a full genome of millions or billions. of pairs Just like with the monkeys, the job gets much more manageable when broken down into smaller segments and, since commodity hardware and open source software spare research budgets, Hadoop has become a dominant tool in the sequencing community.

TAGGED:amazon ec2
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

image fx (2)
Monitoring Data Without Turning into Big Brother
Big Data Exclusive
image fx (71)
The Power of AI for Personalization in Email
Artificial Intelligence Exclusive Marketing
image fx (67)
Improving LinkedIn Ad Strategies with Data Analytics
Analytics Big Data Exclusive Software
big data and remote work
Data Helps Speech-Language Pathologists Deliver Better Results
Analytics Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

How to Score 300,000,000 Customer Records for $3

5 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots
data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?