Amazon Elastic MapReduce, and other stuff I don’t have time to grok yet

April 2, 2009
146 Views

Lots of good stuff have been coming to my attention lately.

  • Amazon just announced their Amazon Elastic MapReduce program. Sounds like the main point of this service is to simplify setting up a Hadoop cluster in the cloud, and Amazon charges you a little extra above the normal EC2 and S3 costs for this service. Not clear to me yet why people will pay the extra cost instead of running their own instance of Hadoop on EC2. I mean, you can just read Chapter 4 of my book and do this all by yourself easily ;) I hope to look more into this service over the weekend. At the very least this is a sign that a meaningful number of Amazon Web Services’ customers are using the EC2 cloud to run Hadoop, and so Amazon decides to focus on making it easier.
  • The March issue of the IEEE Data Engineering Bulletin is a special issue on data management on cloud computing platforms. It has papers written by academics as well as from Yahoo and IBM. Haven’t had time to read it yet, but it looks like Hadoop and Amazon EC2 are mentioned a lot.
  • Just heard about the open source Sector-Sphere project, which is a system for distributed storage and computation using commodity computers. In other words, it’s an alternati

Lots of good stuff have been coming to my attention lately.

  • Amazon just announced their Amazon Elastic MapReduce program. Sounds like the main point of this service is to simplify setting up a Hadoop cluster in the cloud, and Amazon charges you a little extra above the normal EC2 and S3 costs for this service. Not clear to me yet why people will pay the extra cost instead of running their own instance of Hadoop on EC2. I mean, you can just read Chapter 4 of my book and do this all by yourself easily ;) I hope to look more into this service over the weekend. At the very least this is a sign that a meaningful number of Amazon Web Services’ customers are using the EC2 cloud to run Hadoop, and so Amazon decides to focus on making it easier.
  • The March issue of the IEEE Data Engineering Bulletin is a special issue on data management on cloud computing platforms. It has papers written by academics as well as from Yahoo and IBM. Haven’t had time to read it yet, but it looks like Hadoop and Amazon EC2 are mentioned a lot.
  • Just heard about the open source Sector-Sphere project, which is a system for distributed storage and computation using commodity computers. In other words, it’s an alternative framework to Hadoop but it has a lot of architectural differences. It seems to be just the work of a few academics so far. I hope to play around with it… when I can find time from work and writing the book…

Link to original post