WikiDashboard: Visualizing Wikipedia Edits

Ed Chi, a senior research scientist at the Palo Alto Research Center (PARC), recently delivered a presentation at MIT about …

Daniel Tunkelang
3 Min Read

Google Paper on Parallel EM Algorithm using MapReduce

I hadn’t seen much discussion of this on the web, so I thought I would post the link to this…

Editor SDC
1 Min Read

MPI Cluster with Python and Amazon EC2 (part 2 of 3)

Today I posted a public AMI which can be used to run a small beowulf cluster on Amazon EC2 and…

Editor SDC
1 Min Read

Amazon Web Services Public Datasets

Amazon announced their Hosted Public Data Sets service today, and I expect it to be a game changer. Finding and…

Editor SDC
1 Min Read

Hidden Video Courses in Math, Science, and Engineering

Over the last few years, a large number of open courseware directories and video lecture aggregators have popped up on…

Editor SDC
1 Min Read

Some Datasets Available on the Web

The Datawrangling blog was put on the back burner last May while I focused on my startup. Now that I…

Editor SDC
1 Min Read

Python Montage Code for Displaying Arrays

This post will show how to replicate the Matlab montage function using Python. The Data Wrangling blog seems to be…

Editor SDC
1 Min Read

Amazon EC2 Considered Harmful

“The TruckNumber is the size of the smallest set of people in a project such that, if all of them…

Editor SDC
1 Min Read

The Colbert Bump in Amazon Data

Last month, I took a position as Director of Advanced Analytics at Juice. I’m primarily a machine learning guy, so…

Editor SDC
1 Min Read

PyCon 2008 ElasticWulf Slides

Here are the ElasticWulf slides from my talk. The video will eventually be posted to the PyCon site. Elasticwulf Pycon…

Editor SDC
1 Min Read