By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    AI analytics
    AI-Based Analytics Are Changing the Future of Credit Cards
    6 Min Read
    data overload showing data analytics
    How Does Next-Gen SIEM Prevent Data Overload For Security Analysts?
    8 Min Read
    hire a marketing agency with a background in data analytics
    5 Reasons to Hire a Marketing Agency that Knows Data Analytics
    7 Min Read
    predictive analytics for amazon pricing
    Using Predictive Analytics to Get the Best Deals on Amazon
    8 Min Read
    data science anayst
    Growing Demand for Data Science & Data Analyst Roles
    6 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: The Amazing Big Data World of Kaggle and the Crowd-Sourced Data Scientist
Share
Notification Show More
Aa
SmartData CollectiveSmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > The Amazing Big Data World of Kaggle and the Crowd-Sourced Data Scientist
Big Data

The Amazing Big Data World of Kaggle and the Crowd-Sourced Data Scientist

Bernard Marr
Last updated: 2014/07/09 at 3:56 AM
Bernard Marr
8 Min Read
Image
SHARE

Image

If you’re looking for a company which seems to embody all the principles of big data entrepreneurship under one roof, then look no further than Kaggle.

Crowd sourcing, predictive modelling, gamification – Kaggle has it all – and has worked out how to turn a profit from them.

Image

More Read

Big Data Analytics

Big Data: A Review Of 12 Amazing Months in Analytics

How Walmart is Tackling the Big Data Skills Crisis
How HR Can Use Big Data in a Smart Way (Hint: Most Are Not)
How Big Data and Analytics Are Changing Football
Is Big Data a Scam?

If you’re looking for a company which seems to embody all the principles of big data entrepreneurship under one roof, then look no further than Kaggle.

Crowd sourcing, predictive modelling, gamification – Kaggle has it all – and has worked out how to turn a profit from them.

The San Francisco-based business awards cash prizes to its teams of “citizen scientists” who compete to untangle big data challenges of all shapes and sizes.

And it isn’t just businesses which are benefitting – by applying the concept of crowd-sourcing to data analytics, they are helping to further scientific and medical research. Their projects include looking deep into the cosmos for traces of dark matter, and furthering research into HIV treatment.

Chief scientist at Google (which has itself benefitted from Kaggle’s research) and Kaggle investor, Hal Varian, describes it as “a way to organise the brainpower of the world’s most talented data scientists and make it accessible to organizations of every size.”

And that’s certainly an intriguing aim – as well as a highly profitable one – in a world where businesses of all sizes are beginning to cotton on to the benefits of big data. Even if every company could afford to set up its own data analytics department, there aren’t nearly enough people trained to do the job to go around!

As with all emerging sciences, there is a shortage of trained data scientists at the moment – but Kaggle has 150,000 of them, ready to farm out to the highest bidder.

As well as charging companies they work with (including Amazon, Facebook, Microsoft and Wikipedia) up to $300 per hour for consultancy work, the company organises competitions – which is where the gamification comes in.

I’ve written about gamification before here – and Kaggle works along the same lines, with the theory being that it is easier to get people to take part in something if it is presented to them as a challenge or competition of some sort.

Current challenges include assisting with schizophrenia diagnosis by identifying the condition from MRA neuroimaging data, and finding the Higgs Boson amidst the mountains of data collected by CERN’s Atlas particle physics experiments.

They are open to anybody to take part in, and all the information (as well as the necessary data sets can be found at Kaggle’s website here. 

Although it is frequently reported that they have “over 100,000 data scientists”, these are actually registered users and competitors rather than employees. There are no qualification or experience barriers to registering as a Kaggle data scientist, previous winners have ranged from data science academics and professionals to enthusiastic, knowledgeable amateurs. However certain competitions are occasionally reserved for “masters” – those who have shown they have the right stuff through their previous work with Kaggle.

The company also also recruit its own staff to work on internal projects. In fact they are advertising for recruits now – and although no requirements are listed, other than that applicants be “experienced”, two questions on the application form ask for the mean and standard deviation of two sets of numbers.

The concept is undoubtedly inspired by earlier pioneering work in crowd-sourcing data analysis, such as the Search For Extra-terrestrial Intelligence at Home (SETI @home) project, and a competition organised by Netflix in 2009 offering £1 million to the person who came up with a better algorithm for providing movie recommendations.

Kaggle has taken those idea and expanded on them, basically – it acts as the middle man, with companies or organizations bringing their problems, and Kaggle packaging them into competitions, gathering the contestants and sharing out the rewards.

The data itself is often simulated – and contestants are challenged to come up with methods or algorithms which are more efficient than existing methods at solving the problem in hand. Using simulated data means that issues surrounding access to sensitive data can be sidestepped. Once that is done, the reward – currently up to $30,000, although occasionally much larger for the top projects – is paid.

One of its best known success stories was the Heritage Health Prize, which awarded $3 million last year to the winning entrant, whose algorithm most accurately predicted which patients would be admitted to hospital in the coming 12 months, from a set of medical data.

They also offer the Kaggle In Class service – an academic spin-off of the main brand which offers free data processing tools and simulated challenges. It is intended for use in schools and colleges struggling to meet the challenges of training the first generations of professional data scientists.

Of course like anything new it isn’t without its critics. In particular, questions have been asked about how valuable the research it leads to actually is – often, they say, the biggest challenges in data analysis revolve around what data is needed, and what questions should be asked. Kaggle’s pre-packaged competitions take this element out of the equation. The crowdsourced data scientists might be working on the solution to a particular problem – but is it the correct one? And might there be more relevant data elsewhere, other than that supplied in the competition package?

This might be a fundamental limitation to the competition model, until data collection and distribution evolves to the point where it can be made available to contestants in real-time, and then of course there will be serious privacy and data protection issues to hurdle.

But as it stands today, Kaggle is one of the more forward-thinking innovations in big data, and has done much to raise awareness of the power that crowd sourcing data analysis can bring to businesses and organisations of all sizes. 

—–

Finally, please check out my other posts in The Big Data Guru column and feel free to connect with me via Twitter, LinkedIn, Facebook, slideshare and The Advanced Performance Institute.

About: Bernard Marr is a best-selling author, keynote speaker and consultant in analytics and big data. He helps companies understand and leverage big data and analytics in a way that improves business performance.

TAGGED: The Big Data Guru
Bernard Marr July 9, 2014
Share This Article
Facebook Twitter Pinterest LinkedIn
Share
By Bernard Marr
Follow:
Bernard Marr is a best-selling author, keynote speaker, strategic performance consultant and analytics, KPI and Big Data guru.

Follow us on Facebook

Latest News

Data Ethics: Safeguarding Privacy and Ensuring Responsible Data Practices
Data Ethics: Safeguarding Privacy and Ensuring Responsible Data Practices
Best Practices Big Data Data Collection Data Management Privacy
data protection for SMEs
8 Crucial Tips to Help SMEs Guard Against Data Breaches
Data Management
How AI is Boosting the Customer Support Game
How AI is Boosting the Customer Support Game
Artificial Intelligence
AI analytics
AI-Based Analytics Are Changing the Future of Credit Cards
Analytics Artificial Intelligence Exclusive

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

Big Data Analytics
AnalyticsBig Data

Big Data: A Review Of 12 Amazing Months in Analytics

10 Min Read
Big Data Skills
Big Data

How Walmart is Tackling the Big Data Skills Crisis

8 Min Read
Image
Uncategorized

How HR Can Use Big Data in a Smart Way (Hint: Most Are Not)

8 Min Read
Image
Uncategorized

How Big Data and Analytics Are Changing Football

6 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive
AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?