Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
    pexels pavel danilyuk 8112119
    Data Analytics Is Revolutionizing Medical Credentialing
    8 Min Read
    data and seo
    Maximize SEO Success with Powerful Data Analytics Insights
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: The Fallacy of the Data Scientist Shortage
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Business Intelligence > Business Rules > The Fallacy of the Data Scientist Shortage
AnalyticsBusiness IntelligenceBusiness RulesCloud ComputingCollaborative DataCommentaryData MiningData WarehousingDecision ManagementHadoopJobsMapReducePredictive AnalyticsR Programming LanguageSentiment AnalyticsStatisticsText AnalyticsUnstructured Data

The Fallacy of the Data Scientist Shortage

nraden
nraden
8 Min Read
SHARE

 

There is no question that the USA (in fact, most of the world) would be well-served with more quantitatively capable people to work in business and government. However, the current hysteria over the shortage of data scientists is overblown. To illustrate why, I am going to use an example from air travel.

 

On a recent trip from Santa Fe, NM to Phoenix, AZ, I tracked the various times:

More Read

BlackBerry Brand Re-Boots CEO
Apache Drill vs. Apache Spark: What’s The Right Tool for the Job?
Present for the past
Cloud Technology Helps Students Earn Higher SAT Scores
A Deep Dive in Big Data

 

 

 

There is no question that the USA (in fact, most of the world) would be well-served with more quantitatively capable people to work in business and government. However, the current hysteria over the shortage of data scientists is overblown. To illustrate why, I am going to use an example from air travel.

 

On a recent trip from Santa Fe, NM to Phoenix, AZ, I tracked the various times:

 

 

Duration (min)

Cumulative (min)

Drive from Santa Fe to ABQ Airport

65

65

Park

15

80

Security

25

105

Wait to board

20

125

Boarding process

30

155

Taxiing

15

170

In flight

60

230

Taxiing

12

242

Deplane

9

251

Wait for valet bag

7

258

Travel to rental car

21

279

Arrive at destination in Tempe

32

311

 

 

As you can see, the actual flying time of 60 minutes represents only 19% of the travel time.  Because everything but the actual flight time is more or less constant for any domestic trip (disregarding common delays, connections and cancellations which would skew this analysis even farther), this low percentage of time in the air is a reality. For example, if the flight took 2 hours and fifteen minutes, it would still work out to 135/386 = 35%. The most recent data I have, from 2005, shows the average non stop distance flown per departure was 607 miles, so we can add about 25 minutes to the first calculation and arrive at 85/336 =  25%.

 

Keep in mind, again, these calculations do not account for late departures/arrivals, cancelled and re-booked flights, connections, flight attendants and pilots having nervous breakdowns, etc. It’s safe to say that at most 25% of your travel time is spent in the air. Just for fun, let’s see how this would work out if we could take the (unfortunately retired) Concorde.  We would reduce our travel time by flying at Mach 2.5 by 40 minutes, trimming out journey from five hours and eleven minutes to four hours and 31 minutes, about a 13% improvement.

 

What’s the point of all of this and what does it have to do with the so-called data scientist shortage?

 

Based on our research at Constellation Research, we find that analysts that work with Hadoop or other big data technologies spend a significant amount of time NOT requiring any knowledge of advanced quantitative methods – configuring and maintaining clusters, writing programs to gather, move, cleanse and otherwise organize data for analysis and many other common tasks in data analysis. In fact, even those who employ advanced quantitative techniques spend from 50-80% of their time gathering, cleansing and preparing data. This percentage has not budged in decades. Keep in mind that advanced analytics is not a new phenomenon; what is new is the volume (to some extent) and variety of the source data with new techniques to deal with it, especially, but not limited to, Hadoop.

 

The interest in analytics has risen dramatically in the past two or three years,  that is not in dispute. But the adoption of enterprise-scale analytics with big data is not guaranteed in most organizations beyond some isolated areas of expertise. Most of the activity is in predictable (commercial) industries – net-based businesses, financial services, and telecommunications, for example, but these businesses have employed very large-scale analytics, at the bleeding edge of technology for decades.  For most organizations, analytics will be provided by embedded algorithms in applications not developed in-house and third-party vendors of tools and services and consultants.

 

The good news is that 80% of the expertise you need for big data is readily available. The balance can be sourced and developed.  “The crème-de-la-crème of data scientists will fill roles in academia, technology vendors, Wall Street, research and government.

 

There are related and unrelated disciplines that are all combined under the term analytics. There is advanced analytics, descriptive analytics, predictive analytics and business analytics, all defined in a pretty murky way. It cries out for some precision. Here is how I characterize the many types of analytics by the quantitative techniques used and the level of skill of the practitioners who use these techniques.

 

 

Descriptive Title

Quantitative Sophistication/Numeracy

Sample Roles

Type I

Quantitative Research (True Data Scientist)

PhD or equivalent

Creation of theory, development of algorithms. Academic/research. Often employed in business or government for very specialized roles

Type II

(Current definition of) Data Scientist or Quantitative Analyst

Advanced Math/Stat, not necessarily PhD

Internal expert in statistical and mathematical modeling and development, with solid business domain knowledge

Type III

Operational Analytics

Good business domain, background in statistics optional

Running and managing analytical models. Strong skills in and/or project management of analytical systems implementation

Type IV

Business Intelligence/ Discovery

Data and numbers oriented, but no special advanced statistical skills

Reporting, dashboard, OLAP and visualization use, possibly design, Performing posterior analysis of results driven by quantitative methods

 

 “Data Scientist” is a relatively new title for quantitatively adept people with accompanying business skills. The ability to formulate and apply tools to classification, prediction and even optimization, coupled with fairly deep understanding of the business itself, is clearly in the realm of Type II efforts. However, it seems pretty likely that most so-called data scientists will lean more towards the quantitative and data-oriented subjects than business planning and strategy. The reason for this is that the term data scientist emerged from those businesses like Google or Facebook where the data is the business; so understanding the data is equivalent to understanding the business. This is clearly not the case for most organizations. We see very few Type II data scientists with the in-depth knowledge of the whole business as, say, actuaries in the insurance business, whose extensive training should be a model for the newly designated data scientists (see my blogs “Who Needs Analytics PhD’s? Grow Your Own” and “What is a Data Scientist and What Isn’t.”)

 

 

 

 

 

 

TAGGED:analyticsbusiness intelligencebusiness rulesclouddata miningData ScientisthadoopMapReduceneil radenoptimizationPredictive
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

image fx (2)
Monitoring Data Without Turning into Big Brother
Big Data Exclusive
image fx (71)
The Power of AI for Personalization in Email
Artificial Intelligence Exclusive Marketing
image fx (67)
Improving LinkedIn Ad Strategies with Data Analytics
Analytics Big Data Exclusive Software
big data and remote work
Data Helps Speech-Language Pathologists Deliver Better Results
Analytics Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

business intelligence
Business Intelligence

The Role of Business Intelligence in The Modern Commercial Organization

6 Min Read
big data in HR
Analytics

Data Analytics Can Bolster HR in Niche Industries

7 Min Read
How to Analyze the True Effectiveness of Your Website
Web Analytics

How to Analyze the True Effectiveness of Your Website

6 Min Read

PAW: Predictive modeling and today’s growing data challenges

6 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence
ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?