By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    predictive analytics in dropshipping
    Predictive Analytics Helps New Dropshipping Businesses Thrive
    12 Min Read
    data-driven approach in healthcare
    The Importance of Data-Driven Approaches to Improving Healthcare in Rural Areas
    6 Min Read
    analytics for tax compliance
    Analytics Changes the Calculus of Business Tax Compliance
    8 Min Read
    big data analytics in gaming
    The Role of Big Data Analytics in Gaming
    10 Min Read
    analyst,women,looking,at,kpi,data,on,computer,screen
    Promising Benefits of Predictive Analytics in Asset Management
    11 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: An Inconvenient Truth: Crowd-Sourced Approaches to Statistical Analysis in Science – and what this means for you
Share
Notification Show More
Latest News
ai software development
Key Strategies to Develop AI Software Cost-Effectively
Artificial Intelligence
ai in omnichannel marketing
AI is Driving Huge Changes in Omnichannel Marketing
Artificial Intelligence
ai for small business tax planning
Maximize Tax Deductions as a Business Owner with AI
Artificial Intelligence
ai in marketing with 3D rendering
Marketers Use AI to Take Advantage of 3D Rendering
Artificial Intelligence
How Big Data Is Transforming the Maritime Industry
How Big Data Is Transforming the Maritime Industry
Big Data
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Uncategorized > An Inconvenient Truth: Crowd-Sourced Approaches to Statistical Analysis in Science – and what this means for you
Uncategorized

An Inconvenient Truth: Crowd-Sourced Approaches to Statistical Analysis in Science – and what this means for you

Peter James Thomas
Last updated: 2015/11/04 at 8:43 AM
Peter James Thomas
8 Min Read
SHARE

Frequentists vs. Bayesians - © xkcd.com
© xkcd.com (adapted from the original to fit the dimensions of this page)

Frequentists vs. Bayesians - © xkcd.com
© xkcd.com (adapted from the original to fit the dimensions of this page)

No, not a polemic about climate change, but instead some observations on the influence of statistical methods on statistical findings. It is clearly a truism to state that there are multiple ways to skin a cat, what is perhaps less well-understood is that not all methods of flaying will end up with a cutaneously-challenged feline and some may result in something altogether different.

So an opaque introduction, let me try to shed some light instead. While the points I am going to make here are ones that any statistical practitioner would (or certainly should) know well, they are perhaps less widely appreciated by a general audience. I returned to thinking about this area based on an article by Raphael Silberzahn and Eric Uhlmann in Nature [1], but one which I have to admit first came to my attention via The Economist [2].

More Read

big data improves

3 Ways Big Data Improves Leadership Within Companies

IT Is Not Analytics. Here’s Why.
Romney Invokes Analytics in Rebuke of Trump
WEF Davos 2016: Top 100 CEO bloggers
In Memoriam: Robin Fray Carey

Messrs Silberzahn and Uhlmann were propounding a crowd-sourced approach to statistical analysis in science, in particular the exchange of ideas about a given analysis between (potentially rival) groups before conclusions are reached and long before the customary pre- and post-publication reviews. While this idea may well have a lot of merit, I’m instead going to focus on the experiment that the authors performed, some of its results and their implications for more business-focussed analysis teams and individuals.

The interesting idea here was that Silberzahn and Uhlmann provided 29 different teams of researchers the same data set and asked them to investigate the same question. The data set was a sporting one covering the number of times that footballers (association in this case, not American) were dismissed from the field of play by an official. The data set included many attributes from the role of the player, to when the same player / official encountered each other, to demographics of the players themselves. The question was – do players with darker skins get dismissed more often than their fairer teammates?

Leaving aside the socio-political aspects that this problem brings to mind, the question is one that, at least on first glance, looks as if it should be readily susceptible to statistical analysis and indeed the various researchers began to develop their models and tests. A variety of methodologies was employed, “everything from Bayesian clustering to logistic regression and linear modelling” (the authors catalogued the approaches as well as the results) and clearly each team took decisions as to which data attributes were the most significant and how their analyses would be parameterised. Silberzahn and Uhlmann then compared the results.

Below I’ll simply repeat part of their comments (with my highlighting):

Of the 29 teams, 20 found a statistically significant correlation between skin colour and red cards […]. The median result was that dark-skinned players were 1.3 times more likely than light-skinned players to receive red cards. But findings varied enormously, from a slight (and non-significant) tendency for referees to give more red cards to light-skinned players to a strong trend of giving more red cards to dark-skinned players.

This diversity in findings is neatly summarised in the following graph (please click to view the original on Nature’s site):

Nature Graph

© NPG. Used under license 3741390447060 Copyright Clearance Center

To be clear here, the unanimity of findings that one might have expected from analysing what is essentially a pretty robust and conceptually simple data set was essentially absent. What does this mean aside from potentially explaining some of the issues with repeatability that have plagued some parts of science in recent years?

Well the central observation is that precisely the same data set can lead to wildly different insights dependent on how it is analysed. It is not necessarily the case that one method is right and others wrong, indeed in review of the experiment, the various research teams agreed that the approaches taken by others were also valid. Instead it is extremely difficult to disentangle results from the algorithms employed to derive them. In this case methodology had a bigger impact on findings than any message lying hidden in the data.

Here we are talking about leading scientific researchers, whose prowess in statistics is a core competency. Let’s now return to the more quotidian world of the humble data scientist engaged in helping an organisation to take better decisions through statistical modelling. Well the same observations apply. In many cases, insight will be strongly correlated with how the analysis is performed and the choices that the analyst has made. Also, it may not be that there is some objective truth hidden in a dataset, instead only a variety of interpretations of this.

Now this sounds like a call to abandon all statistical models. Nothing could be further from my point of view [3]. However caution is required. In particular those senior business people who place reliance on the output of models, but who maybe do not have a background in statistics, should maybe ask themselves whether what their organisation’s models tell them is absolute truth, or instead simply more of an indication. They should also ask whether a different analysis methodology might have yielded a different result and thus dictated different business action.

At the risk of coming over all Marvel, the great power of statistical modelling comes with great responsibility.

In 27 years in general IT and 15 in the data/information space (to say nothing of my earlier Mathematical background) I have not yet come across a silver bullet. My strong suspicion is that they don’t exist. However, I’d need to carry out some further analysis to reach a definitive conclusion; now what methodology to employ…?
 


 
Notes

 
[1]
 
Crowdsourced research: Many hands make tight work. Raphael Silberzahn &a Eric L. Uhlmann. Nature. 07 October 2015.
07 October 2015
 
[2]
 
On the other hands – Honest disagreement about methods may explain irreproducible results.The Economist 10th Oct 2015.
 
[3]
 
See the final part of my trilogy on using historical data to justify BI investments for a better representation of my actual views.

 

Peter James Thomas November 4, 2015
Share this Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

ai software development
Key Strategies to Develop AI Software Cost-Effectively
Artificial Intelligence
ai in omnichannel marketing
AI is Driving Huge Changes in Omnichannel Marketing
Artificial Intelligence
ai for small business tax planning
Maximize Tax Deductions as a Business Owner with AI
Artificial Intelligence
ai in marketing with 3D rendering
Marketers Use AI to Take Advantage of 3D Rendering
Artificial Intelligence

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

big data improves
Big DataJobsKnowledge ManagementUncategorized

3 Ways Big Data Improves Leadership Within Companies

6 Min Read
Image
Uncategorized

IT Is Not Analytics. Here’s Why.

7 Min Read

Romney Invokes Analytics in Rebuke of Trump

4 Min Read

WEF Davos 2016: Top 100 CEO bloggers

14 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive
data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?