The Emerging Big Data Ecosystem

Slowly but surely, big data is becoming mainstream. Of course, if you listened only to the hype from analysts and vendors, you might think this was already the case. I suspect it’s more like teenage sex, more talked about than actually happening. But, seems like we’re about to move into roaring twenties.

I had the pleasure to be invited as the external expert speaker at IBM’s PureData launch in Boston this week. In a theatrical, dry-ice moment, IBM rolled out one of their new PureData machines between the previously available PureFlex and PureApplication models. However, for me, the launch carried a much more complex and, indeed, subtle message than “here’s our new, bright and shiny hardware”. Rather, it played on a set of messages that is gradually moving big data from a specialized and largely standalone concept to an all-embracing, new ecosystem that includes all data and the multifarious ways business needs to use it.

Despite long-running laments to the contrary, IT has had it easy when it comes to data management and governance. Before you flame me, please read at least the rest of this paragraph. Since the earliest days of general-purpose business computing in the 1960s, we’ve worked with a highly modeled and carefully designed representation of reality. Basically, we’ve taken the messy, incoherent record of what really happens in the real word and hammered it into relational (and previously popular hierarchical or network) databases. To do so, we’ve worked with highly simplified models of the world. These simplifications range from grossly wrong (all addresses must include a 5-digit zip-code–yes, there are still a few websites that enforce that rule) to obviously naive (multiple purchases by a customer correlate to high loyalty) as well as highly useful to managing and running a business (there exists a single version of the truth for all data). The value of useful simplifications can be seen in the creation of elegant architectures that enable business and IT to converse constructively about how to built systems the business can use. They also reduce the complexity of the data systems; one size fits all. The danger lies in the longer-term rigidity such simplifications can cause.

The data warehouse architecture of the 1980s, to which I was a major contributor, of course, was based largely on the above single-version-of-the-truth simplification. There’s little doubt it has served us well. But, big data and other trends are forcing us to look again at the underlying assumptions. And find them lacking. IBM (and it’s not alone in this) has recognized that there exists different business use patterns of data which lead to different technology sweet spots. The fundamental precept is not new, of course. The division of computing into operational, informational and collaborative is closely related. The new news is that the usage patterns are non-exclusive and overlapping; and they need to co-exist in any business of reasonable size and complexity. I can identify four major business patterns: (1) mainstream daily processing, (2) core business monitoring and reporting, (3) real-time operational excellence and (4) data-informed planning and prediction. And there are surely more. This week, IBM announced three differently configured models: (1) PureData System for Transactions, (2) for Analytics and (3) Operational Analytics, each based on existing business use patterns and implementation expertise. Details can be found here. I imagine we will see further models in the future.

All of this leads to a new architectural picture of the world of data–an integrated information platform, where we deliberately move form a layered paradigm to one of interconnected pillars of information, linked via integration, metadata and virtualization. A more complete explanation can be found in my white paper, “The Big Data Zoo–Taming the Beasts: The need for an integrated platform for enterprise information”. As always, feedback is very welcome–questions, compliments and criticisms.