In-database analytics and Decision Management

One of the hot topics these days in analytics is “in-database” or “in-warehouse” analytics.

One of the hot topics these days in analytics is “in-database” or “in-warehouse” analytics. I have blogged about multiple products in this space including the work SAS has done with Teradata, Aster, DB2 , Netezza and Greenplum, the work Revolution Analytics has done with Netezza, Angoss and FuzzyLogix (described in this report). With all this interest it seemed to me that I should write a piece on the role of in-database analytics and Decision Management.

First, let’s differentiate between the two major cases of in-database analytics – in-database model creation and in-database model execution.

In-database Model Creation

Model creation first. In a standard analytic environment an analyst extracts an analytic dataset from the database or data warehouse – integrating multiple sources, flattening the structure and cleaning things up – before creating potentially large numbers of variables from the data. These variables often collapse behavior over time into a variable – taking payment records and creating a variable “Number of times payment was late in the last 12 months” for instance. This enhanced analytic dataset (original data plus calculated variables) is then run through multiple modeling algorithms and either the best algorithm is selected or an ensemble of the models is built.

In-database model creation involves running some part of this process on the database server – either the data integration and variable creation process alone or the whole process. Not only does this use the server processing capacity it also eliminates the need to extract and move the data. By accessing the data directly from the same server performance can be significantly improved, especially in the integration and variable creation pieces, less so in the modeling piece. Integration, cleansing and variable creation can also be easily shared across multiple models, reducing the management and calculation overhead for multiple models.

In-database Model Execution

The second element is one of model execution – taking the model produced and calculating the result of the model for a specific record or group of records – a customer, a claim or all accounts in the system for instance. Historically lots of model algorithms are re-coded by IT departments for performance or scalability (or simply to allow it to run on their hardware). Some models can be exported as code and then executed while others are exported as Predictive Model Markup Language (PMML) and loaded into a Business Rules Management System or BRMS. Only once this is done can the score, segment assignment, association rules or whatever actually be used in a report or a system.

In-database analytics model execution means pushing this execution onto the database server also. This means being able to score the records in the database or warehouse by running them through the algorithm and storing the result. More interestingly it can also mean being able to treat the model as though it is an attribute and calculate it live when the attribute is requested. So, for instance, the segment to which a customer belongs can be treated by calling applications as though it is a stored a data element even though it is calculated in real-time from a model when it is needed. In this approach the data is not persisted and the in-database engine acts like a layer on top of the stored data, enhancing it with the model results.

Some in-database partnerships or products do both model creation and model execution, some only do the model creation piece and some only do the analytic data set generation part of model creation. In theory some could do the model execution piece only but I am not aware of any. With that primer on in-database analytics, how does Decision Management work with in-database analytics?

Decision Management and In-database Analytics

Decision Management ensures that predictive analytic insights are effectively applied to the decisions that impact the business. The use of in-database analytics reduces the time to build and deploy models, making it quicker and easier to enhance a business decision with predictive analytic models. The support for rapid model updates and the ability to reuse and share elements of the model creation process also mean that models are likely to be refreshed and updated more often, reducing the problem of stale models that are no longer very predictive.

When the predictive analytic model is applied to the decision-making process, especially when this is described and managed in a business rules management system, the operational value becomes very clear to the business. In addition the need to conduct experiments, to test and learn, can be easily handled with Decision Services (which can manage the changing models and logic for these experiments) with easy collaboration across IT, the business and the analytic group.

When to Use a BRMS for Model Execution

If you have in-database analytics it is worthwhile to re-think the balance of pushing models into the database for execution and exporting them using PMML to a BRMS. The former will work well for something like a Regression Model or Neural Network and for models where it is “all or nothing” and the business users are comfortable with the idea of a model as a “black box”. When you need to interact with the model (to see which association rules to use for instance) or when the visibility of the model is critical to the business, a BRMS is proven to be a very effective way to deploy the models. Exposing the models as readable, manageable business rules makes it easier to gain acceptance and to integrate the model results with the rest of the business rules involved in a business decision. Regardless, building the model in-database is still worthwhile.

Personally I see the increasing support for in-database analytics as reflective of the increased interest across industries in using predictive analytics to improve operational decision making.