The Data Science Venn Diagram

October 5, 2010
168 Views

Whenever I’m asked, “Who uses R?”, I usually rattle off a long list of job titles: statistician, analyst, quant, researcher … and that’s before all the domain-specific titles. It would be nice if there were a simple, succinct phrase to describe the process of working with, analyzing, and communicating with real data.

Whenever I’m asked, “Who uses R?”, I usually rattle off a long list of job titles: statistician, analyst, quant, researcher … and that’s before all the domain-specific titles. It would be nice if there were a simple, succinct phrase to describe the process of working with, analyzing, and communicating with real data.

At the new blog, “dataists“, the inaugural post by Hilary Mason and Chris Wiggins describes a new term which seems like it fits the bill: Data Scientist. Personally, I like the term, as it encompasses more than “mere” data analysis, which can sometimes imply a black-box approach. The dataists describe a process for data science: Obtain, Scrub, Explore, Model, and Interpret; steps which, when taken together, allow the data scientist to tell a complete story about discoveries they have made in the data. The post describes each of these steps in detail and is well worth a read.

R blogger Drew Conway takes this concept a set further with his Venn Diagram of Data Science:

Data_Science_VD 

Data Science is right there at the middle, combining the skills of Hacking, Expertise, and Math/Stats Knowledge. I especially like the way it highlights the danger of applying statistical tools (including R) to an applied problem without a rigorous statistical background. Drew highlights this Danger Zone and other aspects of Data Science in his post, which you should also check out.

dataists: The Data Science Venn Diagram