Next query: NoSQL and Business Intelligence
Business intelligence (BI) has long been associated with relational databases and the SQL language. From the earliest days of data warehousing, the qualities of the relational model have been highly valued in the quest for data consistency and quality. In addition, it was assumed that business users are comfortable with tables of information. This has been proven true, especially by spreadsheets, much to IT's chagrin. Tables are also the lingua franca of BI tools and simple Select / Where queries are familiar to many users. But, whatever the rationale, the association of BI and SQL is deeply embedded in the minds of most practitioners. So, the question arises--what about NoSQL; how does this relate to BI? Can it be of use in data warehousing?
I spoke to David Chancogne, CTO of Traackr, a web business measuring the influence of people who blog, tweet and otherwise contribute to the impression the general public forms of brands, products and more on the web. The goal is to assist marketers and advertizing agencies track and target such influencers more effectively. Traackr has built a MongoDB database of the contents of blogs, tweets, etc. and gives its customers reports and analyses of the top influencers in their areas of interest. Is this BI? In its broadest sense, yes. The scope is very specific and the queries pre-defined, but this is still BI at its most basic. Did Chancogne think of it as BI? Actually not, it's simply his business to provide analytics to his customers. Probing a little deeper, I discovered that Traackr is continually trying to optimize its algorithm to rate influence. They do this by extracting data from their database and playing with it in--wait for it---Excel! More BI, but like many a start-up business before them, the choice of Excel was more through familiarity and ease-of-use. Generic BI tools that run against a JSON data store, such as Pentaho's NoSQL solution, Nucleon Software's BI Studio, are beginning to appear that allow generic querying on the data without extracting it to Excel.
A conversation with Julian Browne led to further interesting insights. Browne is the architect of Priority Moments (a location-aware customer loyalty program that offers discounts at affiliated retailers) at O2, the second-largest provider of mobile/cell phone services in the UK, with more than 20 million customers. MongoDB was chosen as the platform for this service largely to deal with the complexity and variability of their product catalog. The challenge is that there exists a bewildering variety of product sets that can be offered to different customers, and changes constantly at the whim of marketing. The absence of a predefined schema, a key characteristic of document-oriented data stores, was a compelling argument for the technology choice. But, what of BI? Customer loyalty programs are prime BI territory, of course, and in this case tracking of uptake of offers is vital. As with Traackr, initial BI was provided through hand-crafted Java programming, although there is growing interest in using the emerging BI tools. Of more interest, however, is the experimental use of a specific feature of the database that allows a query to be left open and as records arrive in the database, they automatically appear in the result, which can be routed to a live HTML5 graph(1) giving real-time feedback to monitor program activity.
How would we summarize the situation regarding BI for document-oriented NoSQL databases? What we see is a fairly recent database technology with its query facilities being used for basic, predefined BI. As might be expected, more generic tooling for building queries is appearing. The type of BI supported is focused, application-specific querying and reporting--the type associated with data marts in traditional BI. This is exactly as we saw in the emergence of BI against relational databases. Note that some of the querying is being performed against the live operational sources. Again, we see the similarity with early reporting approaches with similar concerns about performance impacts on operations. MongoDB addresses this through the creation of eventually consistent replicas. Nonetheless, the demand for real-time BI continues to grow and certain classes of operational analytics will need such real-time or near real-time access.
Where NoSQL does not play a role in BI is also important. Enterprise data warehouses (EDW), with their focus on creating consistent, integrated, historical stores of core business information are set to remain squarely in the relational database world. But, where operational needs drive the choice of a NoSQL document-oriented data store, it is clear that BI can flourish in this environment too. See my latest white paper, "Business Intelligence--NoSQL... No Problem", for further details.
(1) For background on this approach, see hummingbird and data-driven documents.
Dr. Barry Devlin is a founder of the data warehousing industry and among the foremost authorities worldwide on business intelligence (BI) and beyond. He is a widely respected consultant, lecturer and author of “Data Warehouse—from Architecture to Implementation”. Barry has 30 years of experience in the IT industry, previously with IBM, as an architect, consultant, manager and software ...