“Data Science”: what’s in a name?

May 11, 2011
188 Views

The terms “Data Science” and “Data Scientist” have only been in common usage for a little over a year, but they’ve really taken off since then: many companies are now hiring for “data scientists”, and entire conferences are run under the name of “data science”.

The terms “Data Science” and “Data Scientist” have only been in common usage for a little over a year, but they’ve really taken off since then: many companies are now hiring for “data scientists”, and entire conferences are run under the name of “data science”. But despite the widespread adoption, some have resisted the change from the more traditional terms like “statistician” or “quant” or “data analyst”.

Personally, I love the term. As a statistician, I was getting tired of explaining that no, I don’t spend my time writing down baseball or cricket scores. I think “Data Science” better describes what we actually do: a combination of computer hacking, data analysis, and problem solving. Pete Warden, initally resistant to the terminology, has since come around to the benefits of the phrase. (Pete, by the way, is the creator of the awesome Data Science Toolkit, a awesome open-source server with APIs for handy data-related tasks, like identifying proper names in unstructured text, or converting street addresses to latitude/longitude.) In his post at O’Reilly Radar, he addresses the following objections to the use of the term “data science”:

  • Data Science is not a real science. (“Anything that needs science in the name is not a real science”)
  • It’s an unnecessary label (why not just stick with statistician, etc.?)
  • The name doesn’t even make sense (what science doesn’t involve data?)
  • There’s no definition (personally, I think Drew Conway’s Data Science Venn Diagram is an excellent definition, expanded in his paper in IQT Quarterly)

Check out Pete’s full post for his refutations of these points. Pete concludes by saying it’s time for the community to rally around “Data Science”:

I’m betting a lot on the persistence of the term. If I’m wrong the Data Science Toolkit will end up sounding as dated as “surfing the information super-highway.” I think data science, as a phrase, is here to stay though, whether we like it or not. That means we as a community can either step up and steer its future, or let others exploit its current name recognition and dilute it beyond usefulness. If we don’t rally around a workable definition to replace the current vagueness, we’ll have lost a powerful tool for explaining our work.

O’Reilly Radar: Why the term “data science” is flawed but useful