Big Data

February 7, 2010
43 Views

Several month’s ago a short video appeared on YouTube with an interview of LinkedIn’s Chief Scientist DJ Patil. In it he discusses how ‘Big Data‘ impacts the practise of analytics. I’ve only just got around to posting about it but I am doing so now because he has some insights that I agree with and would like to share as they are still relevant. 

Big data is today most often associated with the internet superstars like Google, eBay and Amazon. There are 3 other areas with lower profiles where big data is important: intelligence (spooks, the military, etc.), scientific and academic research, and the financial markets.

Big data’s future is much bigger than this because more and more areas of human activity are going to be faced with vast data sets. When you hear people talking about the growth of knowledge and statements like ‘if this data were printed then the stack would grow faster than NASA’s fastest rocket‘, you have to remember that there is a good chance that each page of new data is adding to someone’s analytic data set.

    I’m not quoting the guy verbatim but here’s what I heard…

    Several month’s ago a short video appeared on YouTube with an interview of LinkedIn’s Chief Scientist DJ Patil. In it he discusses how ‘Big Data‘ impacts the practise of analytics. I’ve only just got around to posting about it but I am doing so now because he has some insights that I agree with and would like to share as they are still relevant. 

    Big data is today most often associated with the internet superstars like Google, eBay and Amazon. There are 3 other areas with lower profiles where big data is important: intelligence (spooks, the military, etc.), scientific and academic research, and the financial markets.

    Big data’s future is much bigger than this because more and more areas of human activity are going to be faced with vast data sets. When you hear people talking about the growth of knowledge and statements like ‘if this data were printed then the stack would grow faster than NASA’s fastest rocket‘, you have to remember that there is a good chance that each page of new data is adding to someone’s analytic data set.

      I’m not quoting the guy verbatim but here’s what I heard and my takeouts to his comments:

      • Open source ‘big data ready’ technologies like Hadoop (see my earlier blog or here) have come into their own now. Look to people with these skills over those only with SQL if you are facing big data challenges.
      • We have reached a tipping point in the use of open source for commercial solutions to big data problems.
      • If you want good analysts then the best place is to look is in occupations where people will already have the practical skills in manipulating big data sets: scientific fields like meteorology, oceanography and the like. I agree but this is not the only place as in my experience I also need analysts that relate well to business decision makers – i.e. those people that make commercial decisions based on the analytics. This is perhaps less important in pure tech plays like LinkedIn.
      • Open source will transform the practise of analytics in the next 3 – 5 years. I think it will take longer than this to really impact the more traditional industries. I’m not happy about this but I am realistic about the difficulty in convincing business leaders that open source is a superior solution to proprietary ones. The money behind the big vendors will keep them going for a number of years yet.

      One potential qualifier to DJ Patil’s perspective is that although he has a very impressive big data background as a mathematician, US Department of Defence analyst (‘Threat Anticipation’), and former eBay Director of Strategy and Analytics, his current employer is LinkedIn.

      The core of LinkedIn’s big data is structured and fairly static: profiles of people. So I’m not sure how similar their big data challenges would be to, say, those faced with processing, understanding and predicting large streams of real time data from financial markets or very large sensor arrays. On the other hand, the growth of LinkedIn communities and their related activities must generate large amounts of semi-structured data.

      I also have no idea what LinkedIn’s own analytic goals are beyond what DJ mentions on his own profile where he says his analytics drives product features like:

      • “People You May Know”
      • “Who Viewed My Profile”
      • “Groups You Might Like”

      Maybe somebody reading this blog knows more?

      The video is on YouTube and I embed it here for convenience:

      Or you can download ‘DJ Patil on How Big Data Impacts Analytics’ directly from this blog.

      Link to original post

      You may be interested

      Big Data Revolution in Agriculture Industry: Opportunities and Challenges
      Analytics
      25 views
      Analytics
      25 views

      Big Data Revolution in Agriculture Industry: Opportunities and Challenges

      Kayla Matthews - July 24, 2017

      Big data is all about efficiency. There are many types of data available, and many ways to use that information.…

      How SAP Hana is Driving Big Data Startups
      Big Data
      298 shares3,195 views
      Big Data
      298 shares3,195 views

      How SAP Hana is Driving Big Data Startups

      Ryan Kh - July 20, 2017

      The first version of SAP Hana was released in 2010, before Hadoop and other big data extraction tools were introduced.…

      Data Erasing Software vs Physical Destruction: Sustainable Way of Data Deletion
      Data Management
      154 views
      Data Management
      154 views

      Data Erasing Software vs Physical Destruction: Sustainable Way of Data Deletion

      Manish Bhickta - July 20, 2017

      Physical Data destruction techniques are efficient enough to destroy data, but they can never be considered eco-friendly. On the other…