Advice for the Aspiring Data Scientist
It’s been well-documented throughout the blogosphere over the past few weeks, but every time I read it, I’m still a little surprised--more so by the grandiosity of it all, rather than the fact of the matter. Nevertheless, the fact remains: data scientist has been declared to be the “sexiest job of the 21st century” by Thomas Davenport and D.J. Patil in the Harvard Business Review.
But why all of the sudden? Why does Indeed.com report that the growth rate for the data scientist position has reached upwards of 15000 percent this year? According to Davenport and Patil, these individuals are being asked to wrestle with Big Data and help businesses find direction within the noise.
Their sudden appearance on the business scene reflects the fact that companies are now wrestling with information that comes in varieties and volumes never encountered before. If your organization stores multiple petabytes of data, if the information most critical to your business resides in forms other than rows and columns of numbers, or if answering your biggest question would involve a “mashup” of several analytical efforts, you’ve got a big data opportunity. - Thomas Davenport and D.J. Patil, Harvard Business Review
But what does it take to become a data scientist? How is one to prepare for these responsibilities when the role didn’t even exist a few years ago? I caught up with a few experts who have seen wide success in the World of Data to glean what advice they would provide aspiring data scientists. I spoke with:
- Krishna Gopinathan, COO and founder of big data platform company Global Analytics Holdings;
- Michael Griffin, co-founder and CTO at retail search engine marketing company Adlucent; and
- Bruno Aziza, VP of Worldwide Marketing at business intelligence software SiSense.
They suggested the following.
Focus on Obtaining a Well-Rounded Hard Science Education
Griffin, who is currently hiring for a data scientist within his company, is looking for someone with a Ph. D in computer science, machine learning, statistics, applied mathematics, physics, econometrics, or related disciplines. While any of these disciplines could prepare an individual for a career as a data scientist, it’s important to “fill in the gaps” of one’s education. In-depth knowledge of statistics may not be sufficient in solving a business' unsolved data mysteries, Gopinathan notes.
“The solution to a problem may be hidden in a particular machine learning algorithm or a traditional statistical model,” says Gopinathan. “Individuals experienced in various domains and working with different problems will be the ones who succeed.”
For these same reasons, it’s beneficial to stay in-touch with research even after one’s academic career ends. Gopinathan advises individuals to regularly read and subscribe to academic journals such as the IEEE PAMI or the Journal of Machine Learning Research.
Data scientists are often handed the problems that others have unsuccessfully solved in the past. Whether the data was too “dirty” or the database was too large, these individuals will be asked to solve the (seemingly) impossible to push organizations to the forefronts of their respective industries.
But that doesn’t mean that individuals can wallow in their data and not produce actionable insight. Griffin is quick to point out that “working in a commercial environment is just different than academia.” Rather than code entirely themselves, he notes, these individuals have developers and programmers at their disposals to assist with projects.
“You have to be able to produce something that makes a difference very quickly," Griffin says. Effective project management and “getting things done” will separate the good data scientists from the great ones.
Aziza says he has found the most successful data scientists balance brains and brawn--they can manipulate databases like its nothing, work well with other coworkers and present findings to the executive board. “Think of a data scientist more like the business analyst-plus,” says Aziza.
Reading books on personal development is just as important as reading about the latest computer algorithm. Likewise, reading about how other businesses are using data--made famous by Davenport and Harris’ Competing On Analytics--can help others think creatively when researching new avenues of data manipulation.
Keep Adding Tools
Jeff Hammerbacher, who previously lead the data science team at Facebook, recounted that his team would use Python, R, and Hadoop, and then have to relay the findings to a non-technical team on any given day. The more you know, the better you’ll be prepared to solve the day’s problems.
Aziza points to his SiSense's recent data professional salary study (here) to show that data scientists should learn as many applications as they can--60 percent of the data scientists he polled use 3 or more Data Warehouse/Business Intelligence (DWBI) applications in their roles. In addition to these applications, an expansive knowledge of Hbase, MySQL, Cassandra and others will be beneficial throughout your career.
There are a number of websites that can provide resources to obtain more knowledge and test your data might. For example, Big Data University provides free resources to learn more advanced applications of JAQL, Hive, Pig and others, while Kaggle provides data science contests (with cash prizes) to build your portfolio and compete with others in the community.
I’m curious if any others in the community have advice they’d like to share--or they wish someone had given them at the start of their careers. If you have any other advice for aspiring data scientists or would like to read more on this subject, please leave a note in the comments below. You can also read more on this subject over on the SoftwareAdvice.com blog here.