Variety Is the Spice of Data Science

In many ways, data scientists are like dogs. I don’t mean they are hairy (although beards feature a lot amongst the men). I mean that as with dogs there are scores of different types, with vastly different attributes and specialisations all belonging to the same ‘family’. Just like a Terrier and a Rottweiler vary massively from each other so too does the ‘Bayesian statistics’ data scientist vary from the ‘machine learning’ data scientist.

Does this matter? For businesses it could be incredibly important. The ‘type’ of data scientist a company uses could have a profound impact on how problems are approached and solved. This doesn’t mean that the answer a data scientist will come up with will be ‘wrong’, but it could mean that from the vast spectrum of data science techniques, the approach that would yield the most informative solution for the business might not get picked because this data scientist is not familiar with that methodology. Creating the right balance of data scientists in a team will create an environment that allows problems to be approached from different angles and spur healthy methodological debates that spark innovation and finding the ‘best’ solution.

Determining what makes a data scientist tick goes a long way to understanding what approach they will take to solve a problem. First, it’s important to remember that a data scientist is the sum of several different academic parts: part-computer scientist, part-mathematician and specialists in particularly fields, for example, heavy industry.

Second, the methodologies that make up data science are not consistent. A practitioner can be more of a mathematician than a computer scientist and vice versa. Whichever subject the data scientist favours will naturally have a huge bearing on the techniques they favour.

For example, a ‘statistician’ data scientist will tend to worry more about error terms and emphasise the use of statistical models to describe and predict. This is in contrast to a data scientist that comes from a computer science background. They tend to worry more about how to query and transform the data efficiently.

To add to the complication, these subjects can also be further sub-divided into a number of different specialist areas, or arguably, subjects in their own right. Statistics can be sub-divided into specialist areas such as classical frequentist statistics, Bayesian statistics and non-parametric statistics. In practice, this can mean problems are approached in different ways. Bayesians explicitly make assumptions about what they believe they might get to see in their data and then they update their beliefs once the data has arrived. Frequentists tend to make a lot of hidden assumptions about the nature of data they are dealing with (e.g. the data is normally distributed) and they are very focused on unbiased estimators. Whereas non-parametric statisticians tend to make no assumptions about the nature of their data.

In relation to how each type of data scientist will practically approach data, it’s sufficient to say that they may reach slightly different conclusions based on the same information.

Data science is the quest to translate a question into something that could be answered using data and then applying a variety of techniques to see what happens and drive us forward.

Although it is great to have a team of data scientists with different backgrounds so that different perspectives and approaches can flourish when tackling a business problem, the most important question is not what academic background your data scientist has, but whether he or she has the imagination to apply a variety of different techniques from different fields to novel contexts to answer a question.