Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    sales and data analytics
    How Data Analytics Improves Lead Management and Sales Results
    9 Min Read
    data analytics and truck accident claims
    How Data Analytics Reduces Truck Accidents and Speeds Up Claims
    7 Min Read
    predictive analytics for interior designers
    Interior Designers Boost Profits with Predictive Analytics
    8 Min Read
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: The Amazing Big Data World of Kaggle and the Crowd-Sourced Data Scientist
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > The Amazing Big Data World of Kaggle and the Crowd-Sourced Data Scientist
Big Data

The Amazing Big Data World of Kaggle and the Crowd-Sourced Data Scientist

Bernard Marr
Bernard Marr
8 Min Read
Image
SHARE

Image

If you’re looking for a company which seems to embody all the principles of big data entrepreneurship under one roof, then look no further than Kaggle.

Crowd sourcing, predictive modelling, gamification – Kaggle has it all – and has worked out how to turn a profit from them.

Image

More Read

Reality Mining – Too Much Personalization?
Why it should be “target & test”, not “test & target”
US support for gay marriage, graphed
Predicting the next Viral Tweet
5 Data Mining Tips to Leverage the Benefits of Surveys

If you’re looking for a company which seems to embody all the principles of big data entrepreneurship under one roof, then look no further than Kaggle.

Crowd sourcing, predictive modelling, gamification – Kaggle has it all – and has worked out how to turn a profit from them.

The San Francisco-based business awards cash prizes to its teams of “citizen scientists” who compete to untangle big data challenges of all shapes and sizes.

And it isn’t just businesses which are benefitting – by applying the concept of crowd-sourcing to data analytics, they are helping to further scientific and medical research. Their projects include looking deep into the cosmos for traces of dark matter, and furthering research into HIV treatment.

Chief scientist at Google (which has itself benefitted from Kaggle’s research) and Kaggle investor, Hal Varian, describes it as “a way to organise the brainpower of the world’s most talented data scientists and make it accessible to organizations of every size.”

And that’s certainly an intriguing aim – as well as a highly profitable one – in a world where businesses of all sizes are beginning to cotton on to the benefits of big data. Even if every company could afford to set up its own data analytics department, there aren’t nearly enough people trained to do the job to go around!

As with all emerging sciences, there is a shortage of trained data scientists at the moment – but Kaggle has 150,000 of them, ready to farm out to the highest bidder.

As well as charging companies they work with (including Amazon, Facebook, Microsoft and Wikipedia) up to $300 per hour for consultancy work, the company organises competitions – which is where the gamification comes in.

I’ve written about gamification before here – and Kaggle works along the same lines, with the theory being that it is easier to get people to take part in something if it is presented to them as a challenge or competition of some sort.

Current challenges include assisting with schizophrenia diagnosis by identifying the condition from MRA neuroimaging data, and finding the Higgs Boson amidst the mountains of data collected by CERN’s Atlas particle physics experiments.

They are open to anybody to take part in, and all the information (as well as the necessary data sets can be found at Kaggle’s website here. 

Although it is frequently reported that they have “over 100,000 data scientists”, these are actually registered users and competitors rather than employees. There are no qualification or experience barriers to registering as a Kaggle data scientist, previous winners have ranged from data science academics and professionals to enthusiastic, knowledgeable amateurs. However certain competitions are occasionally reserved for “masters” – those who have shown they have the right stuff through their previous work with Kaggle.

The company also also recruit its own staff to work on internal projects. In fact they are advertising for recruits now – and although no requirements are listed, other than that applicants be “experienced”, two questions on the application form ask for the mean and standard deviation of two sets of numbers.

The concept is undoubtedly inspired by earlier pioneering work in crowd-sourcing data analysis, such as the Search For Extra-terrestrial Intelligence at Home (SETI @home) project, and a competition organised by Netflix in 2009 offering £1 million to the person who came up with a better algorithm for providing movie recommendations.

Kaggle has taken those idea and expanded on them, basically – it acts as the middle man, with companies or organizations bringing their problems, and Kaggle packaging them into competitions, gathering the contestants and sharing out the rewards.

The data itself is often simulated – and contestants are challenged to come up with methods or algorithms which are more efficient than existing methods at solving the problem in hand. Using simulated data means that issues surrounding access to sensitive data can be sidestepped. Once that is done, the reward – currently up to $30,000, although occasionally much larger for the top projects – is paid.

One of its best known success stories was the Heritage Health Prize, which awarded $3 million last year to the winning entrant, whose algorithm most accurately predicted which patients would be admitted to hospital in the coming 12 months, from a set of medical data.

They also offer the Kaggle In Class service – an academic spin-off of the main brand which offers free data processing tools and simulated challenges. It is intended for use in schools and colleges struggling to meet the challenges of training the first generations of professional data scientists.

Of course like anything new it isn’t without its critics. In particular, questions have been asked about how valuable the research it leads to actually is – often, they say, the biggest challenges in data analysis revolve around what data is needed, and what questions should be asked. Kaggle’s pre-packaged competitions take this element out of the equation. The crowdsourced data scientists might be working on the solution to a particular problem – but is it the correct one? And might there be more relevant data elsewhere, other than that supplied in the competition package?

This might be a fundamental limitation to the competition model, until data collection and distribution evolves to the point where it can be made available to contestants in real-time, and then of course there will be serious privacy and data protection issues to hurdle.

But as it stands today, Kaggle is one of the more forward-thinking innovations in big data, and has done much to raise awareness of the power that crowd sourcing data analysis can bring to businesses and organisations of all sizes. 

—–

Finally, please check out my other posts in The Big Data Guru column and feel free to connect with me via Twitter, LinkedIn, Facebook, slideshare and The Advanced Performance Institute.

About: Bernard Marr is a best-selling author, keynote speaker and consultant in analytics and big data. He helps companies understand and leverage big data and analytics in a way that improves business performance.
TAGGED:The Big Data Guru
Share This Article
Facebook Pinterest LinkedIn
Share
ByBernard Marr
Follow:
Bernard Marr is a best-selling author, keynote speaker, strategic performance consultant and analytics, KPI and Big Data guru.

Follow us on Facebook

Latest News

sales and data analytics
How Data Analytics Improves Lead Management and Sales Results
Analytics Big Data Exclusive
ai in marketing
How AI and Smart Platforms Improve Email Marketing
Artificial Intelligence Exclusive Marketing
AI Document Verification for Legal Firms: Importance & Top Tools
AI Document Verification for Legal Firms: Importance & Top Tools
Artificial Intelligence Exclusive
AI supply chain
AI Tools Are Strengthening Global Supply Chains
Artificial Intelligence Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Image
AnalyticsBig DataBusiness IntelligenceData MiningExclusiveInside CompaniesModelingPredictive Analytics

Amazon: Using Big Data Analytics to Read Your Mind

6 Min Read
Image
Uncategorized

Is This the Biggest Big Data Company You Have Never Heard of?

8 Min Read
Image
Big DataBusiness IntelligenceBusiness RulesCloud ComputingData MiningDecision ManagementHadoopITLocationMapReduceMobilityModelingOpen SourcePredictive AnalyticsSocial DataSocial Media AnalyticsSoftwareUnstructured DataWeb AnalyticsWorkforce AnalyticsWorkforce Data

Big Data Is Changing Every Industry, Even Yours!

7 Min Read
Image
Big Data

How Big Data and the Internet of Things Make Our World Smarter

6 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots
giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?