First Look - KXEN - SmartData Collective

Copyright © 2009 James Taylor. Visit the original article at First Look – KXEN.I got my first chance to really see KXEN’s product a little while ago and, as I am off to Predictive Analytics World tomorrow, I thought I would blog about it. KXEN was founded in 1998. The product is designed to deliver […]

I got my first chance to really see KXEN’s product a little while ago and, as I am off to Predictive Analytics World tomorrow, I thought I would blog about it. KXEN was founded in 1998. The product is designed to deliver automated data mining and predictive analytics at a function level – the user doesn’t have to choose variables or algorithms. KXEN essentially takes everything and runs the user-selected function (e.g. classification/regression) to create many models. It then selects from these models to optimize how well the chosen model will handle future data. The product is designed to be easily OEMed but also has a lot of direct customers, most of whom also use a “heavy duty” traditional analytic workbench like SAS (they have perhaps a 80% overlap among major customers). Most use is in direct marketing and customer analytics but they have a fairly wide range of OEMs including Alterian, Experian (several groups within Experian including Clarity Blue), Kognitio, Optimine, etc. They generally play a role in situations where in database scoring, lots of models or speed to market/ease of use is important. Their customers, they say, are either visionaries looking to do way more analytics than they could manage with their existing approaches, those with no-one to do the analytic heavy lifting and OEMs. The product is easy to use, quick to develop models and easy to integrate.

The product runs locally or in a client-server environment as either a Windows client or an in-browser Java client. It supports four main modeling functions – Classification/Regression, Time Series, Clustering, and Association rules. These can be used for acquisition, cross-sell, churn/attrition, segmentation, forecasting, anomaly detection, etc. The set of functions comprise a best-of-breed algorithm for each modeling type. Because KXEN is not a “toolbox of algorithms”, KXEN claims to be easier to use (avoids algorithm choice and tuning), as well as easier to automate.

Once a user selects the modeling approach they want, data can be pulled from a database or flat file or SAS file. Although, like all analytic tools, it needs “flattened” data, their new release supports aggregations, joins, pivots, and WHERE- clauses through an SQL generator. Users can use the query tool or access a SQL query or SQL view – a nice feature.

Having brought in the data, KXEN guesses the type of each element and allows the user to edit them to show which fields are really continuous, which are not etc. Tools to examine the data, see what unique values exit and so are provided. To generate the models, the user picks a target field and specifies any fields to be ignored. The tool handles splitting the data into training, test and validation (or just training and test if there is not much data) and then runs a number of models through the systems, adapting and refining the models as it does so. When it is done it typically had a model with a lot of contributing variables (because the tool is happy to use lots) and it reports on both the predictive power and on how reliable the model is likely to be – do you have enough data for a good answer and will more data improve it.

Users can review the model and can see an ordered list of key variables in terms of impact and then drill into bins and influence etc. They can also see correlated variables (where one appears to impact another) – KXEN allows correlated variables but manages overfitting using a structured risk minimization (SRM) approach. If the model must be explainable or you have some production reason to do so the user can reduce the number of variables being used automatically.

Modelers are still able to add fancy variables based on further analysis and various expert settings are supported. When something complex is necessary, especially on the IT side, this can be loaded in and then hidden from the analyst.

For output, KXEN can update the database directly with the scoring allowing it to run the whole process on the database server. It can also output SQL or SAS code, PMML etc. The latter allowing it to integrate with lots of rules products that support PMML for deployment. It can also save a KXEN shell script. KXEN scripts can do anything the UI can do and can be shared etc. These scripts can also be parameterized, a useful feature for OEMs and for building many models (per region, product group, etc).