Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
    pexels pavel danilyuk 8112119
    Data Analytics Is Revolutionizing Medical Credentialing
    8 Min Read
    data and seo
    Maximize SEO Success with Powerful Data Analytics Insights
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: The Amazing Big Data World of Kaggle and the Crowd-Sourced Data Scientist
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > The Amazing Big Data World of Kaggle and the Crowd-Sourced Data Scientist
Big Data

The Amazing Big Data World of Kaggle and the Crowd-Sourced Data Scientist

Bernard Marr
Bernard Marr
8 Min Read
Image
SHARE

Image

If you’re looking for a company which seems to embody all the principles of big data entrepreneurship under one roof, then look no further than Kaggle.

Crowd sourcing, predictive modelling, gamification – Kaggle has it all – and has worked out how to turn a profit from them.

Image

More Read

Too much information
Data as an Indispensable Organizational Asset
4 Data Goldmines Your Company Should Not Ignore
A year on: The promise of SAP HANA for Big Data analytics (Part Two)
Images to Data :OCR Softwares

If you’re looking for a company which seems to embody all the principles of big data entrepreneurship under one roof, then look no further than Kaggle.

Crowd sourcing, predictive modelling, gamification – Kaggle has it all – and has worked out how to turn a profit from them.

The San Francisco-based business awards cash prizes to its teams of “citizen scientists” who compete to untangle big data challenges of all shapes and sizes.

And it isn’t just businesses which are benefitting – by applying the concept of crowd-sourcing to data analytics, they are helping to further scientific and medical research. Their projects include looking deep into the cosmos for traces of dark matter, and furthering research into HIV treatment.

Chief scientist at Google (which has itself benefitted from Kaggle’s research) and Kaggle investor, Hal Varian, describes it as “a way to organise the brainpower of the world’s most talented data scientists and make it accessible to organizations of every size.”

And that’s certainly an intriguing aim – as well as a highly profitable one – in a world where businesses of all sizes are beginning to cotton on to the benefits of big data. Even if every company could afford to set up its own data analytics department, there aren’t nearly enough people trained to do the job to go around!

As with all emerging sciences, there is a shortage of trained data scientists at the moment – but Kaggle has 150,000 of them, ready to farm out to the highest bidder.

As well as charging companies they work with (including Amazon, Facebook, Microsoft and Wikipedia) up to $300 per hour for consultancy work, the company organises competitions – which is where the gamification comes in.

I’ve written about gamification before here – and Kaggle works along the same lines, with the theory being that it is easier to get people to take part in something if it is presented to them as a challenge or competition of some sort.

Current challenges include assisting with schizophrenia diagnosis by identifying the condition from MRA neuroimaging data, and finding the Higgs Boson amidst the mountains of data collected by CERN’s Atlas particle physics experiments.

They are open to anybody to take part in, and all the information (as well as the necessary data sets can be found at Kaggle’s website here. 

Although it is frequently reported that they have “over 100,000 data scientists”, these are actually registered users and competitors rather than employees. There are no qualification or experience barriers to registering as a Kaggle data scientist, previous winners have ranged from data science academics and professionals to enthusiastic, knowledgeable amateurs. However certain competitions are occasionally reserved for “masters” – those who have shown they have the right stuff through their previous work with Kaggle.

The company also also recruit its own staff to work on internal projects. In fact they are advertising for recruits now – and although no requirements are listed, other than that applicants be “experienced”, two questions on the application form ask for the mean and standard deviation of two sets of numbers.

The concept is undoubtedly inspired by earlier pioneering work in crowd-sourcing data analysis, such as the Search For Extra-terrestrial Intelligence at Home (SETI @home) project, and a competition organised by Netflix in 2009 offering £1 million to the person who came up with a better algorithm for providing movie recommendations.

Kaggle has taken those idea and expanded on them, basically – it acts as the middle man, with companies or organizations bringing their problems, and Kaggle packaging them into competitions, gathering the contestants and sharing out the rewards.

The data itself is often simulated – and contestants are challenged to come up with methods or algorithms which are more efficient than existing methods at solving the problem in hand. Using simulated data means that issues surrounding access to sensitive data can be sidestepped. Once that is done, the reward – currently up to $30,000, although occasionally much larger for the top projects – is paid.

One of its best known success stories was the Heritage Health Prize, which awarded $3 million last year to the winning entrant, whose algorithm most accurately predicted which patients would be admitted to hospital in the coming 12 months, from a set of medical data.

They also offer the Kaggle In Class service – an academic spin-off of the main brand which offers free data processing tools and simulated challenges. It is intended for use in schools and colleges struggling to meet the challenges of training the first generations of professional data scientists.

Of course like anything new it isn’t without its critics. In particular, questions have been asked about how valuable the research it leads to actually is – often, they say, the biggest challenges in data analysis revolve around what data is needed, and what questions should be asked. Kaggle’s pre-packaged competitions take this element out of the equation. The crowdsourced data scientists might be working on the solution to a particular problem – but is it the correct one? And might there be more relevant data elsewhere, other than that supplied in the competition package?

This might be a fundamental limitation to the competition model, until data collection and distribution evolves to the point where it can be made available to contestants in real-time, and then of course there will be serious privacy and data protection issues to hurdle.

But as it stands today, Kaggle is one of the more forward-thinking innovations in big data, and has done much to raise awareness of the power that crowd sourcing data analysis can bring to businesses and organisations of all sizes. 

—–

Finally, please check out my other posts in The Big Data Guru column and feel free to connect with me via Twitter, LinkedIn, Facebook, slideshare and The Advanced Performance Institute.

About: Bernard Marr is a best-selling author, keynote speaker and consultant in analytics and big data. He helps companies understand and leverage big data and analytics in a way that improves business performance.
TAGGED:The Big Data Guru
Share This Article
Facebook Pinterest LinkedIn
Share
ByBernard Marr
Follow:
Bernard Marr is a best-selling author, keynote speaker, strategic performance consultant and analytics, KPI and Big Data guru.

Follow us on Facebook

Latest News

image fx (2)
Monitoring Data Without Turning into Big Brother
Big Data Exclusive
image fx (71)
The Power of AI for Personalization in Email
Artificial Intelligence Exclusive Marketing
image fx (67)
Improving LinkedIn Ad Strategies with Data Analytics
Analytics Big Data Exclusive Software
big data and remote work
Data Helps Speech-Language Pathologists Deliver Better Results
Analytics Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

free best big data source top
Big Data

Big Data: 20 Free Big Data Sources Everyone Should Know

5 Min Read
Image
Big Data

Big Data: Gaining Incredible Insights From Employee Records

7 Min Read
Image
Uncategorized

The 5 Sexiest Big Data Jobs Available Today

8 Min Read
Big Data Guru
Big Data

Big Data as a Service (BDaaS) Is the Next Big Thing!

9 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence
AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?