The Big Data Scientist’s Skillset

May 7, 2013
222 Views

The big data scientist is said to be the sexiest job in the 21st century. Successful big data scientists will be in high demand and will be able earn very nice salaries. But to be successful, big data scientists need to have a wide range of skills that until now did not even fit into one department.

The big data scientist is said to be the sexiest job in the 21st century. Successful big data scientists will be in high demand and will be able earn very nice salaries. But to be successful, big data scientists need to have a wide range of skills that until now did not even fit into one department.

big data scientistThey need to have statistical, mathematical, predictive modelling as well as business strategy skills to build the algorithms necessary to ask the right questions and find the right answers. They also need to be able to communicate their findings, orally and visually. They need to understand how the products are developed and even more important, as big data touches the privacy of consumers, they need to have a set of ethical responsibilities.

Apart from the skills that big data scientists can learn in university, they also need to have a special set of personality traits. They need to be very curious people, who enjoy diving deep into the material to find an answer to a yet unknown question. They need to have a natural desire to go beneath the surface of a problem. They need to be thinkers who can ask the right (business) questions. They need to be confident and self-secure as they more often then not will have to deal with situations where there is a lot unknown. They need to be patient as finding the unknown in massive data sets will take a lot of time and developing the algorithm to uncover new insights will often go by trial-and-error. They need to be able to see examples in totally different industries and be able to plot that on their current problem. For example, the Los Angeles Police department uses an algorithm to predict earthquakes to predict where crimes are likely to happen.

A big data scientist understands how to integrate multiple systems and data sets. They need to be able to link and mash up distinctive data sets to discover new insights. This often requires connecting different types of data sets in different forms as well as being able to work with potentially incomplete data sources and cleaning data sets to be able to use them.

Of course the big data scientist needs to be able to program, preferably in different programming languages such as Python, R, Java, Ruby, Clojure, Matlab, Pig or SQL. They need to have an understanding of Hadoop, Hive and/or MapReduce. In addition the need to be familiar with disciplines such as:

  • Natural Language Processing: the interactions between computers and humans;
  • Machine learning: using computers to improve as well as develop algorithms;
  • Conceptual modelling: to be able to share and articulate modelling;
  • Statistical analysis: to understand and work around possible limitations in models;
  • Predictive modelling: most of the big data problems are towards being able to predict future outcomes;
  • Hypothesis testing: being able to develop hypothesis and test them with careful experiments.

The exact background of a big data scientist is of less importance. Great big data scientists can have different backgrounds such as econometrics, physics, biostatistics, computer science, applied mathematics or engineering. Most of the time the background is a Master’s Degree or even PhD. However, to be successful big data scientists should have at least some of the following capabilities:

  • Strong written and verbal communication skills;
  • Being able to work in a fast-paced multidisciplinary environment as in a competitive landscape new data keeps flowing in rapidly and the world is constantly changing;
  • Having the ability to query databases and perform statistical analysis;
  • Being able to develop or program databases;
  • Being able to advice senior management in clear language about the implications of their work for the organisation;
  • Having an, at least basic, understanding of how a business and strategy works;
  • Being able to create examples, prototypes, demonstrations to help management better understand the work;
  • Having a good understanding of design and architecture principles;
  • Being able to work autonomously.

In short, the big data scientist needs to have an understanding of almost everything. Depending on the industry the big data scientist wants to work, they will need to specialize even further as for example a marine big data specialist requires a different set of skills than a historical big data scientist.

Of course, the perfect big data scientist that contains all of the above described skills and capabilities are extremely rare. Perhaps only a handful of big data scientists have all skills as mentioned here. Therefore, organisations should chose and pick from this list what they deem most important in a big data scientist and what the particular requirements are for the job to be done.

Copyright Big Data Startups 2013. You may share using our article tools. Please don’t cut articles from BigData-Startups.com and redistribute by email or post to the web.

(image: Big Data Scientist / shutterstock)

You may be interested

IEEE Big Data Conference 2017 to Highlight Challenges, Opportunities
Big Data
65 shares955 views
Big Data
65 shares955 views

IEEE Big Data Conference 2017 to Highlight Challenges, Opportunities

Ryan Kade - June 23, 2017

Since 2013, the Institute of Electrical and Electronics Engineers has held annual big data conferences to highlight changes and opportunities…

10 of the Top Marketing BI Software Options
Business Intelligence
117 shares1,404 views
Business Intelligence
117 shares1,404 views

10 of the Top Marketing BI Software Options

Hayden B. - June 23, 2017

Business can be complicated sometimes. It’s not always easy to keep track of all the data and information we deal…

The Race for 5G Is the Race for Data Dominance
Big Data
80 shares1,112 views
Big Data
80 shares1,112 views

The Race for 5G Is the Race for Data Dominance

Daniel Matthews - June 22, 2017

Have you noticed how often the phrase “by the year 2020” comes up? In the tech sphere, many are heralding…