More Than Just a Title: How to Identify a Data Scientist

May 27, 2015



After our much-debated blog post, 4 Ways to Spot a Fake Data Scientist, many readers were curious to know what criteria Burtch Works uses to identify data scientists, since the title itself is not always an indicator. The following is adapted from our recently released data science salary study that goes into more detail about the academic background, skills, and day-to-day job responsibilities that we look for when identifying data scientists. To download the full report with complete compensation data and demographic information, as well as how data science salaries compare to predictive analytics, click here.

Data scientists apply sophisticated quantitative and computer science skills to both structure and analyze massive unstructured data sets or continuously streaming data, as well as derive insights from the data and prescribe action. The depth of their coding skills distinguishes them from other predictive analytics professionals, and allows them to exploit data regardless of its source, size, or format. By using one or more general-purpose coding languages, data scientists can tackle problems that are very complex due to the size and disorganization of the data.

To identify data scientists for our recruiting efforts and Burtch Works Studies, we use the following criteria:

1. Educational Background – Data scientists typically have an advanced degree, usually a Master’s or Ph.D., in a quantitative discipline, such as Applied Mathematics, Statistics, Computer Science, Engineering, Economics, or Operations Research. As new data science degree programs, massive open online courses (MOOCs), and boot camps continue to take hold in the quantitative community, and, as more professionals make career changes from other fields, it is likely that data scientists’ educational backgrounds may diversify.

2. Skills – Data scientists are usually proficient users of tools in the Hadoop/MapReduce ecosystem such as Pig and Hive, as well as AWS. There has also been a lot of buzz around Apache Spark, which is becoming a vital tool in the data science toolbox. Data Scientists may use languages such as Python and Java to write programs to automate data parsing, transformation, and analysis, and typically have expert knowledge of statistical and machine learning methods using tools such as R and SAS. Many also use other methods to derive useful information from data, including pattern recognition, signal processing, and visualization.

3. Dataset Size – Data scientists typically work with datasets measured in gigabytes up to petabytes, and often work with continuously streaming data.

4. Job Responsibilities – Data scientists are equipped with the tools and skills to work on every stage of the analytics life cycle including:

  • Data Acquisition – This may involve scraping data, interfacing with APIs, querying relational and non-relational databases, or even defining strategy in relation to what data to pursue.
  • Data Cleaning/Transformation – This may involve parsing and aggregating messy, incomplete, and unstructured data sources to produce data sets that can be used in analytics/predictive modeling.
  • Analytics – This involves statistical and machine learning-based modeling in order to describe or predict patterns in the data.
  • Prescribing Actions – This involves interpreting analytical results, and using data-driven insights to inform business strategy. Strong technical chops alone do not make an exceptional data scientist, so when recruiting we look for a combination of technical and non-technical skills.
  • Programming/Automation – In many cases, data scientists are also responsible for creating libraries and utilities to operationalize or simplify various stages of this process. Often, they will contribute production-level code for a firm’s data products.

There’s been a lot of conversation around this developing field, and I have little doubt that as Big Data tools continue to evolve, our criteria will need to evolve as well. Whether you’re a data scientist, an analytics professional, a programmer, or a data engineer, it’s important that you continue to learn as tools enter the market, and keep up with new technology. It will be very interesting to see if this list changes by next year’s study. I’m sure there are some bleeding edge tools that we’ve missed, so be sure to leave your thoughts in the comments below.

Please note: Predictive analytics professionals were purposely excluded from the data science salary study, and were the subject of their own study, The Burtch Works Study: Salaries of Predictive Analytics Professionals, released in September 2014, which can be downloaded here.