Data Science: A Literature Review

September 29, 2011

Just what is Data Science, anyway? Here’s one take:

Just what is Data Science, anyway? Here’s one take:

Data scientist california

Ever since the term “Data Scientist” was coined by DJ Patil and Jeff Hammerbacker in 2009, there’s been a vigorous debate on what the team actually means. More than 80% of statisticians consider themselves data scientists, but Data Science is more than just Statistics. (My own take is that Data Science is a valuable rebranding of computer science and applied statistics skills.)

To help bring clarity to the issue, Data Scientist and R user Harlan Harris has published a great presentation he gave at the Data Science DC meetup group, “What is Data Science anyway?”. The presentation recaps the key data science discussions over the last few years, from Hal Varian (“the sexy job in the next 10 years will be Statistics”), Mike Driscoll (“sexy skills of data geeks”), Nathan Yau (“data scientists: people who can do it all”), Mike Loukides (“Data science enables the creation of data products”), Hilary Mason (“Data science is clearly a blend of the hackers”), Drew Conway (“The Data Science Venn Diagram”) and many others.

In fact, the entire presentation servers as a literature review for the birth of “Data Science” as a concept, and would make excellent fodder for the “Data Science” page on Wikipedia which, sadly, is still a blank page.

Data Science wikipedia

One thing that seems certain: Data Science is here to stay. Companies are clamoring to hire people with data science skills, and the excitement at data science events like the recent Strata conference in New York is palpable. (This review of the conference even says that “Data Scientists are the new rock stars of the technology world.”) As with all new concepts, the definition of “data science” may seem a bit cloudy now, but I’d wager that in 5 years or less “Data Scientist” will be as natural a job title as “Product Manager” or “Engineer”. Haran notes  a most apposite analogy made by John Cook, who looked at the “computer programmer” role:

“Take an expert programmer back in time 100 years. What are his skills? Maybe he’s pretty good at math. He has good general problem solving skills, especially logic. He has dabbled a little in linguistics, physics, psychology, business, and art. He has an interesting assortment of knowledge, but he’s not a master of any recognized trade.”

Apply the same quote to “data scientists”, and I’d wager that just 5 years or so we’ll look at all the data scientists working around us and wonder how companies ever survived without them.

Harlan adds additional thoughts to his presentation at the link below.

Harlan Harris: Data Science, Moore’s Law, and Moneyball