Keeping count of people (and things)

June 15, 2010
52 Views

I learned while researching The Numerati that the Chinese have 11 different spellings for Osama Bin Laden. (Maybe it’s up to 12 or 13 by now.) So if the quants at the National Security Agency were attempting to monitor Chinese Web traffic about the Al Queda leader, their computers have to recognize all of these different spellings, and group them.

At the same time, I share a name with a prominent author who wrote best-selling books such as How to Live with a Neurotic Dog. Smart systems have to figure out that we’re not the same person. (This, of course, is a huge issue for thousands of people whose names condemn them to no-fly lists.)

It sounds easy, but one of the toughest challenges in digging through unstructured data is to come up with accurate counts of people and entities. Jeff Jonas has a very thoughtful blog post and article on this. He writes:

it is essential to understand the difference between three transactions carried out by three people versus one person who carried out all three transactions... Without the ability to determine when entities are the same, it quickly becomes clear that sensemaking is all but impossible.I find most organizations


I learned while researching The Numerati that the Chinese have 11 different spellings for Osama Bin Laden. (Maybe it’s up to 12 or 13 by now.) So if the quants at the National Security Agency were attempting to monitor Chinese Web traffic about the Al Queda leader, their computers have to recognize all of these different spellings, and group them.

At the same time, I share a name with a prominent author who wrote best-selling books such as How to Live with a Neurotic Dog. Smart systems have to figure out that we’re not the same person. (This, of course, is a huge issue for thousands of people whose names condemn them to no-fly lists.)

It sounds easy, but one of the toughest challenges in digging through unstructured data is to come up with accurate counts of people and entities. Jeff Jonas has a very thoughtful blog post and article on this. He writes:

it is essential
to understand the difference between three transactions carried out by three
people versus one person who carried out all three transactions... Without the ability to determine when
entities are the same, it quickly becomes clear that sensemaking is all but
impossible.
I find most organizations have
underestimated this principle: If a system cannot count, it cannot
predict.

Link to original post

You may be interested

Education and the Blockchain – Should We be Teaching Blockchain in Schools?
IT
55 shares497 views
IT
55 shares497 views

Education and the Blockchain – Should We be Teaching Blockchain in Schools?

Glen Allard - July 26, 2017

It goes without saying that tech progress is moving at a rapid pace. Futurists point to Moore’s law – the…

5 Effective Strategies for Boosting IoT Security
Internet of Things
79 shares1,308 views
Internet of Things
79 shares1,308 views

5 Effective Strategies for Boosting IoT Security

Ryan Kh - July 25, 2017

With the emergence of IoT devices that are being rolled out from time to time, the serious IoT security issues…

The Future of Healthcare and Big Pharma is in Big Data Analytics
Analytics
635 views
Analytics
635 views

The Future of Healthcare and Big Pharma is in Big Data Analytics

riteshmehta - July 25, 2017

The healthcare industry recognizes that Big Data as and opportunity and a challenge for the whole sector. Nevertheless, systems and…