First Look: SAS Factory Miner

June 26, 2015
2105 Views

I got a briefing on SAS Factory Miner, to be officially released in July. SAS Factory Miner is designed to help organizations scale their analytic efforts, enabling them to solve more problems, faster, more accurately and without adding resources.

I got a briefing on SAS Factory Miner, to be officially released in July. SAS Factory Miner is designed to help organizations scale their analytic efforts, enabling them to solve more problems, faster, more accurately and without adding resources.

SAS generally presents the analytic lifecycle with two main loops – a discovery one and a deployment one – both fed with data. Discovery involves determining the problem, preparation, exploration and modeling while deployment involves implementation, acting on the model and evaluating the model. The environments for the two can be different, of course, with discovery taking place for instance on a Hadoop environment while deployment must happen in an operational system like a CRM system or website

SAS Factory Miner is designed to deliver “modeling at scale”. For SAS this means

  • Using all the data (not just a sample)
  • Using all the available attributes for the data.
  • Applying many data modification approaches and predictive algorithms.
  • Building models for more granular segments.
  • Enabling less technical users.
  • Automated deployment of the results.
  • Real time execution of models once deployed

Modeling at scale means moving to a factory model for development and deployment of models rather than a hand-crafted one (see my white paper on operationalizing analytics for instance).

SAS Factory Miner is an automated model building and deployment and retraining environment. It is part of the 9.4 platform and its completely web based, using HTML 5, allowing it to run on a SAS server or in Distributed HPA mode on things like SAS Grid Manager, database appliances or Hadoop clusters. Specific features include:

  • Model Building
  • Designed to work across thousands of segments
  • Uses modern machine learning techniques to develop models quickly
  • Identifies the best performing models and allows testing of these models quickly.
  • Model templates can be developed to share best practices
  • Model results can be ranked using pre-set statistics to quickly identify underperforming models.
  • Model runs can be edited so that users can fine tune performance if they want to.
  • Deployment
  • Models can be deployed for in-database scoring using the SAS in-database integration with environments like Neteza, Teradata, Oracle, DB2, SAP HANA, Hadoop
  • SAS Model Factory is Integrated with SAS Decision Manager to allow analytic models to be used in decision flows
  • Monitoring
  • Existing models can be rebuilt with a few clicks

The product is web based (HTML 5) and uses the new web based interface and analytics hub where a variety of products can be available. Users can develop models using a simple three step process:

  1. Pick a data set and SAS Factory Miner will use either metadata to determine which variables should be the target and segmentation variables or it will scan through the data to identify the roles of the various fields in the dataset (rejecting fields with the same value always, identifying continuous variables, identifying values that might be good segmentation options etc).
  2. Next step is to develop a segmentation approach. It automatically creates a hold out set for model validation. It generates segmentation based on the combination of the selected segmentation variables and automatically calculates the number of observations and the event rate for each segment. Sliders can be used to reduce the number of segments by excluding those with low event rates for instance.
  3. The third step is to apply predictive modeling templates to this data. Each template usually consists of a sequence of steps for analytical data preparation, data transformation, variable selection and modeling algorithms. A set of templates are shipped with SAS Factory Miner and users can create their own templates. The templates are configurable using a drag and drop interface allowing users to reorder pre-processing steps and change defaults for steps etc. In the future there will be some ability to lock down parts of the template to allow less experienced users a narrower scope. Multiple templates can be selected for the project.

The project then creates models for each segment using the templates selected. The best model, the champion, is identified for each segment. Individual models can be investigated and tweaked, new models can be generated and all this functionality is asynchronous, using the server to run the processing. Models can be compared also. SAS Factory Miner allows users to focus only on under-performing models or only on large segments. The tool will also identify places where no good model could be built and just assign a baseline probability score.

Once built, the (potentially very large) set of models can be registered to the SAS model repository as a set. These can then be managed using SAS Model Manager so they can be deployed automatically to the various deployment environments, integrated into SAS Decision Manager etc. SAS Model Manager has a project that matches the Model Factory project and this project contains all the models developed. SAS Model Manager can then manage these models more incrementally though SAS Factory Miner can also retrain the models as a set.

SAS is one of the vendors in our Decision Management Systems Platform Technology Report and you can get more information in SAS Factory Miner here.