By BobGourley
By BobGourley
Among the many Big Data themes we reported on in 2012, one seemed to resonate the most with our readers– all of us with a techie bent have realized that we need more discipline in our use of the term Big Data. We revisited this need for discipline in our post of:
Big Data Defined for 2013: A definition that can help in your interaction with the IT community
In it we suggest everyone follow the lead of the TechAmerica foundation in defining Big Data. At CTOvision we will use the term this way:
Big Data: A phenomenon defined by the rapid acceleration in the expanding volume of high velocity, complex and diverse types of data. Big Data is often defined along three dimensions– volume, velocity and variety.
Big Data Solutions: Advanced techniques and technologies to enable the capture, storage, distribution, management and analysis of information.
Early in the year we provided insights for program managers that want to get a started with Big Data solutions. We gave quickstart tips on how you can stand up your own cluster in the cloud. We followed up with ways you can quickly use Whirr to automate that.
Through the year we published several pieces on topics associated with the ethics issues around Big Data. This included a series by Kord Davis who reported on topics like:
- Ethics and Values in Cloud Computing Architecture
- Identity and Reputation in the Age of Big Data
- Data Ownership, Identity and IP Addresses
- How Big Data Exposes Your Values
We reported extensively on new concepts for Big Data involving very large quantities of data in memory. The greatest expert in this field, Terracotta CEO Robin Gilthorpe, provided his views on Big Data Trends to watch in 2013 by a YouTube video we highlighted to our readers. His view is that requirements will drive the industry to several new highs and will include dramatic social change because of this. His five predictions for 2013 are:
- Big Data will be fast data – Enterprises will profit from Big Data intelligence in proportion to how quickly they can act on it.
- Rise of the hybrid cloud – It’s no longer about building your own platform; it’s more efficient to play in ecosystems.
- CIOs and CMOs get a lot closer – Marketing spend on technology is about to eclipse IT spend on technology.
- The Internet of things crosses the chasm – In just a few years, over 25 billion data-producing devices will be connected.
- Social becomes part of life’s fabric – Remember e-business departments? Social will permeate in the same way.
We also wrote about new concepts for capture, storage, distribution and management of data via new concepts like dispersed compute storage. Solutions like this from Cleversafe (see Cleversafe: how does it really work?) are true game changers inserting dramatic improvements to security and functionality and doing so with a quick return on investment.
We reported on many other firms associated with the fielding of high quality Big Data solutions into the federal enterprise, including MarkLogic, Oracle, Datameer, Cloudera, Terracotta, Cleversafe, Splunk, Kapow, Sitscape, CloudFrontGroup, ClearStory, and Thetus. These firms are fielding real, working solutions for Big Data and we will be reporting more on them in 2013 we are sure.
Another clear theme in our reporting of 2012 on Big Data was the importance of mission focus. That is why we are all so excited about the new technical capabilities of Hadoop and the related technologies. It is about impact to mission. Which leads to the Government Big Data Solutions Award:
Our reporting on Big Data for 2012 included announcing the results of the Government Big Data Solutions Award. The Government Big Data Solutions Award was established to highlight innovative solutions and facilitate the exchange of best practices, lessons learned and creative ideas for addressing Big Data challenges. The Top Five Nominees of 2012 were chosen for criteria that included:
- Focus on current solutions: The ability to make a difference in government missions in the very near term was the most important evaluation factor.
- Focus on government teams: Industry supporting government also considered, but this is about government missions.
- Consideration of new approaches: New business processes, techniques, tools, models for enhancing analysis are key.
The NCI Funded Frederick National Laboratory has been using Big Data solutions in pioneering ways to support researchers working on complex challenges around the relationship between genes and cancers. In a recent example, they have built infrastructure capable of cross-referencing the relationships between 17000 genes and five major cancer subtypes across 20 million biomedical publication abstracts. By cross referencing TCGA gene expression data from simulated 60 million patients and miRNA expression for a simulated 900 million patients. The result: understanding additional layers of the pathways these genes operate in and the drugs that target them. This will help researchers accelerate their work in areas of importance for all humanity. This solution, based on the Oracle Big Data Appliance with the Cloudera Distribution of Apache Hadoop (CDH), leverages capabilities available from the Big Data community today in pioneering ways that can serve a broad range of researchers. The promising approach of this solution is repeatable across many other Big Data challenges for bioinfomatics, making this approach worthy of its selection as the 2012 Government Big Data Solution Award.
We also reported on a classification framework for Big Data solutions produced by Daniel Abadi in a very insightful post on Classifying Today’s “Big Data Innovators”. This is an innovative approach that is easy to think through and should be repeatable for many vendors in this space, and should help enterprise technologists think through which vendors may be right for their mission needs. In it he categorizes the 13 innovative Big Data innovators reported on by Information Week. They are:
1.  MongoDB
 2.  Amazon (Redshift, EMR, DynamoDB)
 3.  Cloudera (CDH, Impala)
 4.  Couchbase
 5.  Datameer
 6.  Datastax
 7.  Hadapt
 8.  Hortonworks
 9.  Karmasphere
 10.  MapR
 11.  Neo Technology
 12.  Platfora
 13.  Splunk
He classifies them into:
1.  Operational data stores that allow flexible schemas
 2.  Hadoop distributions
 3.  Real-time Hadoop-based analytical platforms
 4.  Hadoop-based BI solutions
We will likely return to this classification for reporting in 2013.
What does our reporting over the last 12 months signal for the next 12 months? We believe we will see a continued expansion of the user end of big data solutions. It is probably an oversimplification to say it this way, but one way to look at is is that we have an approach to the backend infrastructure, and that is primarily one built on the Apache Hadoop framework of software over commodity IT integrated into existing but modern enterprise solutions. Their is room for innovation here of course but in general the path of the backend is set and will continue. The dynamic change to expect now is in the user-facing applications. Brace yourself! Changes there will be dynamic.
For reports on Big Data throughout 2013 please sign up for our Government Big Data Newsletter. Find the weekly report at: http://ctovision.com/newsletter-subscriptions/
