Analytics Valley: Big Data, Data Scientists and SAS Programmers
Tom Davenport reported an observation that Silicon Valley is becoming more analytical since companies in the Valley such as Google, Facebook, eBay, LinkedIn all have strong presence in analytics. Besides such dominant companies, I’d also like to add Yahoo to the list, though Yahoo is no longer at its peak. Yahoo is the largest sponsor and contributor of Hadoop, an open source framework for distributed processing of so called “big data”.
When taking a look at the outstanding Facebook data team or LinkedIn data team, we can see that Hadoop is also one of the most overwhelmingly successful technical factors. Such Valley companies themselves are huge consumers of big data and have strong incentives to develop analytical solutions beyond their high technology product pipelines.
Analytical staffs such as at LinkedIn also help promote the wider usage of the term “data scientist.” They identify themselves as data scientists. Now more and more statisticians are also happy to adopt this new title. According to a survey in JSM (2011, Miami), more than 85% (164) statisticians there considered themselves to be “data scientists.”
McKinsey also released a report this past May on big data and the huge unfilled gap of qualified analytical talents. You know when a management consulting firm begins to talk something technical, it is no longer in fashion to follow discussion of the concept. To embrace the challenge of big data, one or more members of the team need a multidiscipline background—basically meaning education in computer science and statistics (and data mining or machine learning is just an interdisciplinary subject of them). Here is an ambitious Quora answer to the question, “How do I become a data scientist?”
For these learning plans, just take the meaning but don’t take the specifics too seriously. Check yourself and set up your own priorities.
Notes for SAS Programmers
For SAS programmers, I read an exciting post besides High Performance Computing that SAS will also play with Hadoop by introducing some functionality in SAS/Access and SAS Data Integration Studio.
For SAS programmers with no IT background, it is not a good idea to jump into algorithms and data structures and other hard core computer courses immediately. Instead I recommend taking full advantage of SAS language and the system itself to dive into the computational world gradually:
1. Learn and practice and practice SAS Proc SQL which is compliant with the SQL-92 standard. SQL is the common language in database world and SAS Proc SQL can help you switch smoothly to Oracle SQL, Teradata SQL, MySql SQL and other SQL implementations although there are some non-critical differences in details.
2. Dig into the operating system specific documentation of SAS, for example in SAS 9.3, SAS 9.3 Companion for Windows or SAS 9.3 Companion for UNIX Environments or others depending the OS you are working on. They are the critical important documentations but unfortunately often missed in SAS programmers’ reading list.
Such docs will help SAS programmers to deal with the machines and expose to the wide computer world in a way that a SAS programmer can understand. You can’t expect to be an expert on computer via such docs, but at least you can communicate fluently with internal IT staff.
3. Then you get all the confidences to play with computer and can switch to any other topics interested in the list above!