Data Replication: The Reality

For a Business Discovery platform to meet the expectations of today’s information worker (fast response times, high degrees of interactivity, self-service data exploration and discovery) and scale across an enterprise, it’s now widely accepted that the use of in-memory processing is required. Here’s a quote from our partner Teradata, which comes from a disk-based heritage: “Naturally for the data which is being used heavily on a day to day basis then there will a more than convincing business case to store this data in-memory to deliver the performance which is required by the business.”

This is no surprise to QlikTech, as this is the approach we pioneered 20 years ago, which is now being taken up by pretty much all competing vendors.

However, we sometimes come across claims that visualization tools querying direct to disk-based databases are a viable alternative approach. To suggest that a deployment that only utilizes a dynamic query to disk approach will meet performance expectations is simply not a reality. While some business discovery providers (including QlikView via the Direct Discovery capability) can directly query sources such as Teradata, it’s important to acknowledge that direct query alone is a) much slower and b) utilizes network traffic in an unbounded fashion. Whilst a direct query capability such as Direct Discovery is a very valuable ‘relief valve’ for access to very large data sets, ALL data discovery providers (including QlikView) recommend the use of a performance optimization layer. In fact, this is one of the defining characteristics of data discovery software according to Gartner (data discovery is their term for Business Discovery):

”Data discovery tools are an increasingly prominent class of BI offering that provide three attributes:

1. A proprietary data structure to store and model data gathered from disparate sources, which minimizes the reliance on predefined drill paths and dimensional hierarchies.

2. A built-in performance layer that obviates the need for aggregates, summaries or pre-calculations.

3. An intuitive interface enabling users to explore data without much training.”*

The reality is that any BI system meant to satisfy business users has to replicate some or all of the data to deliver acceptable performance. Different vendors take different approaches; QlikView uses its associative in-memory engine (which offers up to 90% compression of source data), other vendors use less intelligent in-memory caches, but all the same, they still replicate data. For 20 years QlikTech has developed an in-memory approach that provides a unique high-performance, associative, intelligent data store. In addition we have developed tooling that very effectively manages the data to allow QlikView deployments to scale to many thousands of concurrent users. Any vendor claiming to deliver genuinely useable, fast discovery without recourse to some data replication in memory (or the use of an in-memory database further down the stack – still a rarity) is misguided.

*Source: Gartner ‘The Rise of Data Discovery Tools’, 26 September 2008, ID:G00161601

(Business discovery platform / shutterstock)