BI designers and vendors are very comfortable with layers of data. Data warehouse architectures dating back to the early 1990s almost all follow the same model. A layer of operational systems, an EDW layer for data reconciliation and a layer of data marts that support the users. Sometimes, there are additional layers. On slides, the picture is often turned on its side, but they’re still layers. Irrespective of their orientation, layers pass data serially from one to the next, each performing its particular role before passing the data on. Architecturally, the thinking is sound. Layering enables logical separation of function and responsibility. It drives progressive refinement of data for specialized purposes. Historically, layering in the data warehouse was physically implemented in response to the performance limitations of relational databases. Different workloads demanded different data structures and processing approaches, leading to multiple data copies.
However, layering of data also has serious drawbacks. It introduces delays in getting data from its initial source to its final destination. In a world where business and customers are increasingly demanding real-time reaction, such delays are unacceptable. Layering adds complexity to development and maintenance. Where market uncertainty demands ever more flexible systems, such complexity is a killer. Layering of data, it seems, has had its day.
As we move from a world of traditional data to modern, unbounded information, the questions about layering become even more pressing. Does it make sense to push social media or sensor data through the layers of the data warehouse? Clearly, the answer from big data advocates is a resounding NO! The speed, scalability and affordability of solutions like Hadoop and NoSQL stores appear compelling for these new types of information. For anybody who understands the importance of data quality and consistency and has seen the disastrous results when they are ignored, the simplistic lure of “Hadumping” is deeply worrisome. But, if layering is no longer a viable approach to ensuring information quality, what is the alternative?
Although it may sound simplistic, the answer is to move from layers to pillars. The main illustration shows the three key pillars of the REAL logical architecture of Business unIntelligence. The pillars are fed in parallel, rather than sequentially. The central, process-oriented data pillar contains traditional operational and core informational data, fed from legally binding transactions (both OLTP and contractual). It is centrally placed because it contains core business information, including traditional operational data and informational EDW and data marts in a single logical component. Machine-generated data and human-sourced information are placed as pillars on either side. The leftmost pillar focuses on real-time and well-structured data, while the one on the right emphasizes the less-structured and, at times, less timely information. In this architecture, metadata–more correctly labeled as context-setting information–is explicitly included as a part of the information resource and spans the three pillars. While these pillars are independent, their content is also coordinated and made consistent, as far as necessary based on the core information in the process-mediated pillar, the context-sensitive information stored in all three pillars and the assimilation function shown in the center. From an implementation point of view, these three pillars require relatively differentiated types of storage and processing. While the actual platforms required may change as technology evolves, the distinctly different characteristics of the domains suggest that there will be three (or perhaps more) different optimal performance points on the spectrum of available technology at any time.