Training students on mega-scale data

October 14, 2009
41 Views

In a New York Times article (sub. req.) published on the weekend, IBM and Google expressed doubts that the students graduating from US universities today have the chops to deal with the mulit-terabyte datasets that are becoming commonplace online and in domains like bioscience and astronomy today. From the article:

For the most part, university students have used rather modest computing systems to support their studies. They are learning to collect and manipulate information on personal computers or what are known as clusters, where computer servers are cabled together to form a larger computer. But even these machines fail to churn through enough data to really challenge and train a young mind meant to ponder the mega-scale problems of tomorrow.

The article reveals how Google and IBM are promoting internet-scale research at places like the University of Washington and Purdue. But a curious omission from the article is any mention of open-source technologies that are spurring the innovation in processing and analyzing these data sets. Tools like Hadoop, for processing internet-scale data sets and R, for analyzing the processed data (most likely in some parallelized form), and other



In a New York Times article (sub. req.) published on the weekend, IBM and Google expressed doubts that the students graduating from US universities today have the chops to deal with the mulit-terabyte datasets that are becoming commonplace online and in domains like bioscience and astronomy today. From the article:

For the most part, university students have used rather modest computing systems to support their studies. They are learning to collect and manipulate information on personal computers or what are known as clusters, where computer servers are cabled together to form a larger computer. But even these machines fail to churn through enough data to really challenge and train a young mind meant to ponder the mega-scale problems of tomorrow.

The article reveals how Google and IBM are promoting internet-scale research at places like the University of Washington and Purdue. But a curious omission from the article is any mention of open-source technologies that are spurring the innovation in processing and analyzing these data sets. Tools like Hadoop, for processing internet-scale data sets and R, for analyzing the processed data (most likely in some parallelized form), and other open-source projects not yet conceived, are going to be critical in this endeavour.

New York Times: Training to Climb an Everest of Digital Data

Link to original post

You may be interested

Big Data Revolution in Agriculture Industry: Opportunities and Challenges
Analytics
25 views
Analytics
25 views

Big Data Revolution in Agriculture Industry: Opportunities and Challenges

Kayla Matthews - July 24, 2017

Big data is all about efficiency. There are many types of data available, and many ways to use that information.…

How SAP Hana is Driving Big Data Startups
Big Data
298 shares3,195 views
Big Data
298 shares3,195 views

How SAP Hana is Driving Big Data Startups

Ryan Kh - July 20, 2017

The first version of SAP Hana was released in 2010, before Hadoop and other big data extraction tools were introduced.…

Data Erasing Software vs Physical Destruction: Sustainable Way of Data Deletion
Data Management
154 views
Data Management
154 views

Data Erasing Software vs Physical Destruction: Sustainable Way of Data Deletion

Manish Bhickta - July 20, 2017

Physical Data destruction techniques are efficient enough to destroy data, but they can never be considered eco-friendly. On the other…