SOA for Process and Data Integration

December 16, 2013
154 Views

data integrationTraditionally, BI has been a process-free zone. Decision makers are such free thinkers that suggesting their methods of working can be defined by some stogy process is generally met with sneers of derision. Or worse.

data integrationTraditionally, BI has been a process-free zone. Decision makers are such free thinkers that suggesting their methods of working can be defined by some stogy process is generally met with sneers of derision. Or worse. BI vendors and developers have largely acquiesced; the only place you see process mentioned is in data integration, where activity flow diagrams abound to define the steps needed to populate the data warehouse and marts.

I, on the other hand, have long held – since the turn of the millennium, in fact – that all decision making follows a process, albeit a very flexible and adaptive one. The early proof emerges in operational BI (or decision management, as it’s also called) where decision making steps are embedded in fairly traditional operational processes. As predictive and operational analytics has become increasingly popular, this intermingling of informational and operational is such that these once distinctly different business behaviors are becoming indistinguishable. A relatively easy thought experiment then leads to the conclusion that all decision making has an underlying process.

I was also fairly sure at an early stage that only a Service Oriented Architecture (SOA) approach could provide the flexible and adaptive activities and workflows required. I further saw that SOA could (and would need to) be a foundation for data integration as the demand for near real-time decision making grew. As a result, I have been discussing all this at seminars and conferences for many years now. But every time I’d mention SOA, the sound of discontent would rumble around the room. Too complex. Tried it and failed. And, more recently, isn’t that all old hat now with cloud and mobile?

All of this is by way of introduction to a very interesting briefing I received this week from Pat Pruchnickyj, Director of Product Marketing at Talend, who restored my faith in SOA as an overall approach and in its practical application! Although perhaps best known for its open source ETL (extract, transform and load) and data integration tooling it first introduced in 2006, Talend today take a broader view and offers data focused solutions, such as ETL and data quality, as well as open source application integration solutions, such as enterprise service bus (ESB) and message queuing. These various approaches are united by common metadata, typically created and managed through a graphical, workflow-oriented tool, Talend Open Studio.

So, why is this important? If you follow the history of BI, you’ll know that many well-established implementations are characterized by complex and often long-running batch processes that gather, consolidate and cleanse data from multiple internal operational sources into a data warehouse and then to marts. This is a model that scales poorly in an era where vast volumes of data are coming from external sources (a substantial part of big data) and analysis is increasingly demanding near real-time data. File-based data integration becomes a challenge in these circumstances. The simplest approach may be to move towards ever smaller files running in micro-batches. However, the ultimate requirement is to enable message-based communication between source and target applications/databases. This requires a fundamental change in thinking for most BI developers. So a starting point of ETL and an end point of messaging, both under a common ETL-like workflow, makes for easier growth. Developers can begin to see that a data transfer/cleansing service is conceptually similar to any business activity also offered as a service. And the possibility of creating workflows combining operational and informational processes emerges naturally to support operational BI.

Is this to say that ETL tools are a dying species? Certainly not. For some types and sizes of data integration, a file-based approach will continue to offer higher performance or more extensive integration and cleansing function. The key is to ensure common, shared metadata (or as I prefer to call it, context-setting information, CSI) between all the different flavors of data and application integration.

Process, including both business and IT aspects, is the subject of Chapter 7 of “Business unIntelligence: Insight and Innovation Beyond Analytics and Big Data”.

Sunset Over Architecture (SOA) image: http://vorris.blogspot.com/2012/07/mr-cameron-you-are-darn-right-start.html