The Push and Pull of Data Integration

In my last blog post, I described the reality of so-called analytical data integration, which is really just a fancy name for ETL. Now let’s talk about so-called operational data integration. I’m assuming that when the vendors talk about this, it’s the same thing as “data integration for operational systems.” Most business applications use point-to-point solutions to retrieve and integrate data for their own specific processing needs. This is ETL in reverse: it’s a “pull” process as opposed to a “push” process.Unfortunately this involves a lot of duplicate processing for people to access individual records from source systems. And like…

In my last blog post, I described the reality of so-called analytical data integration, which is really just a fancy name for ETL. Now let's talk about so-called operational data integration. I'm assuming that when the vendors talk about this, it's the same thing as "data integration for operational systems." Most business applications use point-to-point solutions to retrieve and integrate data for their own specific processing needs. This is ETL in reverse: it's a "pull" process as opposed to a "push" process.

Unfortunately this involves a lot of duplicate processing for people to access individual records from source systems. And like their analytical brethren, the moment a source system changes, there is exponential work necessary to support the new modification. Multiply this by thousands of data elements and dozens of source systems, you’ll find a farm of silos and hundreds (if not thousands) of data integration jobs. It's not an uncommon problem.

In most BI environments we begin with a large batch data movement process. We build our ETL so it can occur overnight. But our data volumes are such that overnight isn’t enough. So the next evolution is building "trickle load" ETL. The issue here is that data integration is less about how the data is used as it is when the data is needed and the level of data quality. Most operational systems don’t clean the data, they just move it. And most ETL jobs for data warehouses will standardize the formatting but they won’t change the values. (And if they do fix the values, they don’t communicate those changes back to the source systems.)

If I have specialized data needs I should be building specialized integration logic. If I have commodity or standard needs for data that everyone uses, the data should be highly cleansed.

So it's not about analytical versus operational data integration. It's not even about how the data is used. It's really about one-way versus bi-directional data provisioning. As usual, the word integration is used too loosely. In either case, the presumption that the target is a relational database is naïve. And whether it's for analytical or operational integration is beside the point.

Link to original post