Keeping count of people (and things)

June 15, 2010
132 Views

I learned while researching The Numerati that the Chinese have 11 different spellings for Osama Bin Laden. (Maybe it’s up to 12 or 13 by now.) So if the quants at the National Security Agency were attempting to monitor Chinese Web traffic about the Al Queda leader, their computers have to recognize all of these different spellings, and group them.

At the same time, I share a name with a prominent author who wrote best-selling books such as How to Live with a Neurotic Dog. Smart systems have to figure out that we’re not the same person. (This, of course, is a huge issue for thousands of people whose names condemn them to no-fly lists.)

It sounds easy, but one of the toughest challenges in digging through unstructured data is to come up with accurate counts of people and entities. Jeff Jonas has a very thoughtful blog post and article on this. He writes:

it is essential to understand the difference between three transactions carried out by three people versus one person who carried out all three transactions... Without the ability to determine when entities are the same, it quickly becomes clear that sensemaking is all but impossible.I find most organizations


I learned while researching The Numerati that the Chinese have 11 different spellings for Osama Bin Laden. (Maybe it’s up to 12 or 13 by now.) So if the quants at the National Security Agency were attempting to monitor Chinese Web traffic about the Al Queda leader, their computers have to recognize all of these different spellings, and group them.

At the same time, I share a name with a prominent author who wrote best-selling books such as How to Live with a Neurotic Dog. Smart systems have to figure out that we’re not the same person. (This, of course, is a huge issue for thousands of people whose names condemn them to no-fly lists.)

It sounds easy, but one of the toughest challenges in digging through unstructured data is to come up with accurate counts of people and entities. Jeff Jonas has a very thoughtful blog post and article on this. He writes:

it is essential
to understand the difference between three transactions carried out by three
people versus one person who carried out all three transactions... Without the ability to determine when
entities are the same, it quickly becomes clear that sensemaking is all but
impossible.
I find most organizations have
underestimated this principle: If a system cannot count, it cannot
predict.

Link to original post