Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
    pexels pavel danilyuk 8112119
    Data Analytics Is Revolutionizing Medical Credentialing
    8 Min Read
    data and seo
    Maximize SEO Success with Powerful Data Analytics Insights
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: The First Data Scientist on the Evolution of Data Science
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Software > Hadoop > The First Data Scientist on the Evolution of Data Science
AnalyticsBig DataHadoop

The First Data Scientist on the Evolution of Data Science

gilpress
gilpress
11 Min Read
first data scientist Norman Nie
SHARE

Norman Nie was not surprised by the accurate predictions of the presidential election results from Nate Silver and others. “A lot of it,” he told me recently, “is good statistics and good science and good statistical programming packages.” The increasing amount of money spent by the media on polling, Nie believes, improved the accuracy of predictions by increasing the number of observations. In addition, with so much knowledge available now about every individual, the hypotheses and models used by the forecasters were developed on solid foundations.

Norman Nie was not surprised by the accurate predictions of the presidential election results from Nate Silver and others. “A lot of it,” he told me recently, “is good statistics and good science and good statistical programming packages.” The increasing amount of money spent by the media on polling, Nie believes, improved the accuracy of predictions by increasing the number of observations. In addition, with so much knowledge available now about every individual, the hypotheses and models used by the forecasters were developed on solid foundations. Says Nie: “40 years after The Changing American Voter, we really understand the voter’s decision.”

first data scientist Norman Nie

Nie is referring to the seminal work he (together with Sidney Verba and John Petrocik) published in 1976, itself a response to the landmark 1960 study The American Voter. While the latter described a passive electorate, unconcerned with political issues, and guided in its voting decisions primarily by party allegiance, Nie and his co-authors showed that contemporary voters had higher political awareness and were guided by their own position on the issues rather than political parties. Tracing the changes to the emergence of divisive issues and new voters, they outlined a new political landscape of issue-based factions and a more sophisticated electorate. 

More Read

data analytics can help with payment collections
Using Data Analytics to Optimize Your Cash Collection Approach
3 Ways Fintech Is Using Big Data to Reshape Lending
Zazzle.com and Cafepress.com
Data Mining Book Review: Handbook of Statistical Analysis and Data Mining Applications
Job Market Explodes for Quantitative Students

The two studies differed also in the tools they used to reach their conclusions. Working on his Ph.D. in political science at Stanford University in the mid-1960s, Nie became frustrated with the computer-based statistical analysis tools available at the time. They could not handle the amount of data that he wanted to analyze for his dissertation, which was part of a comparative study based on surveys in seven nations. In Participation and Political Equality (1978), the book summarizing the results of the study, Nie and his co-authors provided us with a glimpse at the big data challenges of the time: “If surfers travel the world to find that perfect wave, and mountain climbers do the same to climb the unclimbable, cross-national survey researchers, burdened with the immense data files, travel anywhere to find the cheaper computer.”

But it was not only the cost and limited capacity of the mainframe he was working with that frustrated Nie.  The statistical analysis tools of the time were difficult to use, and the leading tool, BMD, was originally developed for bio-medical researchers, not for social scientists.

So Nie became the first “data scientist,” I would argue. The term has been used recently to describe the new generation of data massagers and miners, emerging first at Web-based companies that were, to borrow a phrase, “burdened with immense data files.” Nie and the solution to his frustration embodied the 1960s version of not only the challenge created by the growing volumes of data but all the other themes prevalent in today’s discussions of data science: The combination of software engineering, software design, and statistical skills, frequently achieved through team work; the importance of domain knowledge; the creativity and drive leading to new products and services for data analysis; and the scientific, empirical bent characterizing the work of finding new insights in data.

Once he decided to alleviate his frustration by developing a new data analysis tool, Nie has demonstrated the “soft skills” required of a data scientist. He convinced two other people with complementary skills to work with him: Dale Brent who was getting his Ph.D. in Operations Research at Stanford and Tex Hull, a top-notch programmer. Together, they invented the Statistical Package for the Social Sciences or SPSS.

As Nie sees it, his domain knowledge was instrumental in SPSS’ initial success. He was a social scientist first and foremost, developing a tool for other social scientists—easy to use interface, requiring little knowledge of programming, and including the most popular statistical analysis procedures. Designing SPSS from the point of view of “a social scientist looking at observations about people,” says Nie, helped also with the popularity of the software among data analysts outside of academia, where it initially spread by word-of-mouth. First, insurance companies started using it for mortality or risk analysis. They were joined by many other companies, from a variety of industries, all with the need to analyze data they have collected about the behavior and attitudes of their customers or any other relevant constituency. Of great help was the thick manual that came with the software, which established a new standard for software documentation. Again, domain knowledge was important–Nie wanted to have the manual used as a tool for teaching social science research methods, as he did at the University of Chicago where from 1968 to 1998 he was a professor of political science.

While domain knowledge may have been instrumental in launching SPSS, the same statistics and data analysis methods apply across many knowledge domains. Indeed, the two most successful statistical analysis programs, SPSS and SAS, expanded from their domain-specific base (social science for SPSS and bio-medical research for SAS) to become tools used wherever there was data to analyze. ”One of the interesting things about statistics is that the techniques you use tend to be very horizontal in terms of the application,” says Nie, and can be used in a wide variety of fields.

The common denominator for all the various applications and domains where SPSS and other statistical packages were applied, according to Nie, was that “hard data drives model building and model testing.” Empiricism became cool in the 1960s and 1970s and helped drive the widespread use of computer-based statistical tools. The growing complexity of the social and physical world gave rise to many new challenges and there was a growing realization, according to Nie, “that the best way to understand all of these problems is with empirical models.”

“Empirical model-building” is also how “data scientists approach the world” today, says Nie. But big data means that a lot has changed in the intervening years. Specifically, Nie argues, with more data and better tools—both more powerful computers and statistical analysis programs such as R—we have more sophisticated models. The limitation of the technologies of the past forced the use of limited-size samples and approximation methods. Today, says Nie, “we can move beyond linear approximation models” and achieve greater precision and accuracy in forecasts.

This new stage in the evolution of data science holds a lot of promise, but also requires people that could take advantage of the new technologies and techniques for data analysis. Nie cautions about what he says is “the part that’s a little scary,” the challenge of imparting domain knowledge to people trained in data science today. The early “data scientists” were people like him, with deep knowledge of a specific domain and the way questions were asked and answered in it and he thinks it’s difficult to capture it and incorporate it in a data scientist’s education.

The bigger challenge is the education of just about everybody else: “For several generations after World War II, we have told people that they can opt out very early from basic math training.” This has led to a crisis and the failure of the educational system to prepare students for today’s and tomorrow’s jobs. Nie’s advice? “To any entering undergraduate who says ‘What do I do in the American educational system to make sure I have a job when I get out?’ I would say take math and statistics and you’re guaranteed a job.”

Beyond providing more people with the opportunity of getting a good job, an investment in basic data science education can lead to more informed citizenry, making smarter decisions in a political environment characterized by increasing complexity and rising polarization. “Today we’ve become this incredibly polarized society that can’t agree on anything,” says Nie. “Any new issue that comes up is painted either red or blue. We went from high agreement, low partisanship, end-of-ideology at the end of World War II to this absolute bitter struggle between red and blue that we’re in now.”

Ever the inquisitive data scientist, Nie is analyzing survey data to figure out how it all came about. He hopes to publish the results of his research in a forthcoming book, tentatively titled The Ever-Changing American Voter.

(image via shutterstock)

TAGGED:data analysisData Sciencemodel buildingNorman Niepresidential election resultsr languagespssstatistics
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

image fx (2)
Monitoring Data Without Turning into Big Brother
Big Data Exclusive
image fx (71)
The Power of AI for Personalization in Email
Artificial Intelligence Exclusive Marketing
image fx (67)
Improving LinkedIn Ad Strategies with Data Analytics
Analytics Big Data Exclusive Software
big data and remote work
Data Helps Speech-Language Pathologists Deliver Better Results
Analytics Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

data sciences in 2020
Big DataData ScienceExclusive

6 Spectacular Reasons You Must Master the Data Sciences in 2020

9 Min Read

The Future of Data Science

4 Min Read

Feature lists miss the point

4 Min Read
benefits of serverless Kubernetes for data scientists
Data Science

Serverless Kubernetes Has Become Invaluable to Data Scientists

9 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive
AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?