Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    big data analytics in transporation
    Turning Data Into Decisions: How Analytics Improves Transportation Strategy
    3 Min Read
    sales and data analytics
    How Data Analytics Improves Lead Management and Sales Results
    9 Min Read
    data analytics and truck accident claims
    How Data Analytics Reduces Truck Accidents and Speeds Up Claims
    7 Min Read
    predictive analytics for interior designers
    Interior Designers Boost Profits with Predictive Analytics
    8 Min Read
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: The Amazing Big Data World of Kaggle and the Crowd-Sourced Data Scientist
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > The Amazing Big Data World of Kaggle and the Crowd-Sourced Data Scientist
Big Data

The Amazing Big Data World of Kaggle and the Crowd-Sourced Data Scientist

Bernard Marr
Bernard Marr
8 Min Read
Image
SHARE

Image

If you’re looking for a company which seems to embody all the principles of big data entrepreneurship under one roof, then look no further than Kaggle.

Crowd sourcing, predictive modelling, gamification – Kaggle has it all – and has worked out how to turn a profit from them.

Image

More Read

Cloud Security: Vetting Applications and Cloud Providers for Compliance and Security
Defining Analytics: Analytics
How people use Twitter – 10 distinct usage groups
Big Data: CEO, CMO, Now for the CFO
The Top Business Issues facing CIOs / IT Directors – Results

If you’re looking for a company which seems to embody all the principles of big data entrepreneurship under one roof, then look no further than Kaggle.

Crowd sourcing, predictive modelling, gamification – Kaggle has it all – and has worked out how to turn a profit from them.

The San Francisco-based business awards cash prizes to its teams of “citizen scientists” who compete to untangle big data challenges of all shapes and sizes.

And it isn’t just businesses which are benefitting – by applying the concept of crowd-sourcing to data analytics, they are helping to further scientific and medical research. Their projects include looking deep into the cosmos for traces of dark matter, and furthering research into HIV treatment.

Chief scientist at Google (which has itself benefitted from Kaggle’s research) and Kaggle investor, Hal Varian, describes it as “a way to organise the brainpower of the world’s most talented data scientists and make it accessible to organizations of every size.”

And that’s certainly an intriguing aim – as well as a highly profitable one – in a world where businesses of all sizes are beginning to cotton on to the benefits of big data. Even if every company could afford to set up its own data analytics department, there aren’t nearly enough people trained to do the job to go around!

As with all emerging sciences, there is a shortage of trained data scientists at the moment – but Kaggle has 150,000 of them, ready to farm out to the highest bidder.

As well as charging companies they work with (including Amazon, Facebook, Microsoft and Wikipedia) up to $300 per hour for consultancy work, the company organises competitions – which is where the gamification comes in.

I’ve written about gamification before here – and Kaggle works along the same lines, with the theory being that it is easier to get people to take part in something if it is presented to them as a challenge or competition of some sort.

Current challenges include assisting with schizophrenia diagnosis by identifying the condition from MRA neuroimaging data, and finding the Higgs Boson amidst the mountains of data collected by CERN’s Atlas particle physics experiments.

They are open to anybody to take part in, and all the information (as well as the necessary data sets can be found at Kaggle’s website here. 

Although it is frequently reported that they have “over 100,000 data scientists”, these are actually registered users and competitors rather than employees. There are no qualification or experience barriers to registering as a Kaggle data scientist, previous winners have ranged from data science academics and professionals to enthusiastic, knowledgeable amateurs. However certain competitions are occasionally reserved for “masters” – those who have shown they have the right stuff through their previous work with Kaggle.

The company also also recruit its own staff to work on internal projects. In fact they are advertising for recruits now – and although no requirements are listed, other than that applicants be “experienced”, two questions on the application form ask for the mean and standard deviation of two sets of numbers.

The concept is undoubtedly inspired by earlier pioneering work in crowd-sourcing data analysis, such as the Search For Extra-terrestrial Intelligence at Home (SETI @home) project, and a competition organised by Netflix in 2009 offering £1 million to the person who came up with a better algorithm for providing movie recommendations.

Kaggle has taken those idea and expanded on them, basically – it acts as the middle man, with companies or organizations bringing their problems, and Kaggle packaging them into competitions, gathering the contestants and sharing out the rewards.

The data itself is often simulated – and contestants are challenged to come up with methods or algorithms which are more efficient than existing methods at solving the problem in hand. Using simulated data means that issues surrounding access to sensitive data can be sidestepped. Once that is done, the reward – currently up to $30,000, although occasionally much larger for the top projects – is paid.

One of its best known success stories was the Heritage Health Prize, which awarded $3 million last year to the winning entrant, whose algorithm most accurately predicted which patients would be admitted to hospital in the coming 12 months, from a set of medical data.

They also offer the Kaggle In Class service – an academic spin-off of the main brand which offers free data processing tools and simulated challenges. It is intended for use in schools and colleges struggling to meet the challenges of training the first generations of professional data scientists.

Of course like anything new it isn’t without its critics. In particular, questions have been asked about how valuable the research it leads to actually is – often, they say, the biggest challenges in data analysis revolve around what data is needed, and what questions should be asked. Kaggle’s pre-packaged competitions take this element out of the equation. The crowdsourced data scientists might be working on the solution to a particular problem – but is it the correct one? And might there be more relevant data elsewhere, other than that supplied in the competition package?

This might be a fundamental limitation to the competition model, until data collection and distribution evolves to the point where it can be made available to contestants in real-time, and then of course there will be serious privacy and data protection issues to hurdle.

But as it stands today, Kaggle is one of the more forward-thinking innovations in big data, and has done much to raise awareness of the power that crowd sourcing data analysis can bring to businesses and organisations of all sizes. 

—–

Finally, please check out my other posts in The Big Data Guru column and feel free to connect with me via Twitter, LinkedIn, Facebook, slideshare and The Advanced Performance Institute.

About: Bernard Marr is a best-selling author, keynote speaker and consultant in analytics and big data. He helps companies understand and leverage big data and analytics in a way that improves business performance.
TAGGED:The Big Data Guru
Share This Article
Facebook Pinterest LinkedIn
Share
ByBernard Marr
Follow:
Bernard Marr is a best-selling author, keynote speaker, strategic performance consultant and analytics, KPI and Big Data guru.

Follow us on Facebook

Latest News

financial data
Engineering Trust into Enterprise Data with Smart MDM Automation
Big Data Exclusive
christina wocintechchat com 6dv3pe jnsg unsplash
How CIS Credentials Can Launch Your AI Development Career
Exclusive News
big data analytics in transporation
Turning Data Into Decisions: How Analytics Improves Transportation Strategy
Analytics Big Data Exclusive
AI and fund manager software
AI And The Acceleration Of Information Flows From Fund Managers To Investors
Artificial Intelligence Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

amazon analytics big data use
AnalyticsBig DataBusiness IntelligenceCloud ComputingData MiningITPredictive AnalyticsWeb Analytics

How Amazon Uses Big Data to Boost Its Performance

6 Min Read
Image
Uncategorized

The 7 Most Unusual Applications of Big Data You’ve Ever Seen!

6 Min Read
Image
AnalyticsBig DataBusiness IntelligenceData MiningData WarehousingInside CompaniesModelingPolicy and GovernancePredictive AnalyticsPrivacySentiment AnalyticsSocial Media AnalyticsText AnalyticsUnstructured DataWeb Analytics

Facebook’s Big Data: Equal Parts Exciting and Terrifying?

8 Min Read
Image
Big Data

How Big Data and the Internet of Things Make Our World Smarter

6 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots
data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?