What is the scoop behind PMML and Amazon EC2?

April 22, 2009
102 Views

Organizations increasingly recognize the value that predictive analytics offers to their business. The complexity of development, integration, and deployment of predictive models, however, is often considered cost-prohibitive for many projects. In light of mature open source solutions, open standards, and SOA principles we propose an agile model development life cycle that allows us to quickly leverage predictive analytics in operational environments.

Starting with data analysis and model development, you can effectively use the Predictive Model Markup Language (PMML) standard, to move complex decision models from the scientist’s desktop into a scalable production environment hosted on the Amazon Elastic Compute Cloud (Amazon EC2).

Expressing Models in PMML

PMML is an XML-based language used to define predictive models. It was specified by the Data Mining Group, an independent group of leading technology companies including Zementis. By providing a uniform standard to represent such models, PMML allows for the exchange of predictive solutions between different applications and various vendors.

Open source statistical tools such as R can be used to develop data mining models based on


Organizations increasingly recognize the value that predictive analytics offers to their business. The complexity of development, integration, and deployment of predictive models, however, is often considered cost-prohibitive for many projects. In light of mature open source solutions, open standards, and SOA principles we propose an agile model development life cycle that allows us to quickly leverage predictive analytics in operational environments.

Starting with data analysis and model development, you can effectively use the Predictive Model Markup Language (PMML) standard, to move complex decision models from the scientist’s desktop into a scalable production environment hosted on the Amazon Elastic Compute Cloud (Amazon EC2).

Expressing Models in PMML

PMML is an XML-based language used to define predictive models. It was specified by the Data Mining Group, an independent group of leading technology companies including Zementis. By providing a uniform standard to represent such models, PMML allows for the exchange of predictive solutions between different applications and various vendors.

Open source statistical tools such as R can be used to develop data mining models based on historical data. R allows for models to be exported into PMML which can then be imported into an operational decision platform and be ready for production use in a matter of minutes.

On-Demand Predictive Analytics

Amazon EC2 is a reliable, on-demand infrastructure on which we offer the ADAPA® (Adaptive Decision And Predictive Analytics) decision engine based on the Software as a Service (SaaS) paradigm. ADAPA Predictive Analytics Edition imports models expressed in PMML and executes these in batch mode, or real-time via web-services.

Our service is implemented as a private, dedicated Amazon EC2 instance of the ADAPA® Predictive Analytics Edition. Each client has access to his/her own ADAPA® engine instance via HTTP/HTTPS. In this way, models and data for one client never share the same ADAPA® engine with other clients.

The ADAPA Control Center

In order to have ADAPA readily available at Amazon EC2, we built the ADAPA Control Center application which allows for the user launch and manage all ADAPA instances from a single location (see figure below).

An ADAPA instance contains all the functionality of the ADAPA Predictive Analytics Edition. Our service easily scales together with the client’s organizational needs for more power and predictive analytics resources. From the ADAPA Control Center, one can launch new as well as terminate existing instances. Although there is a limit of 20 instances that can be deployed at any single time, Amazon EC2 offers three different instances’ types to address different processing needs. These are: small, large, and extra-large. Also, whenever an instance is no longer necessary, it can be terminated in a matter of seconds.

The ADAPA Console

Each instance executes a single version of the ADAPA Predictive Analytics engine, which can be easily accessed through the Control Center. The engine itself is accessible through the ADAPA Console which allows for the easy managing of predictive models and data files. The instance owner can use the console to upload new models as well as score or classify records on data files in batch mode. Real-time execution of models is achieved through the use of web-services. The ADAPA Console offers a very intuitive interface which is divided into two main sections: model and data management. These allow for existing models to be used for generating decisions on different data sets. Also, new models can be easily uploaded and existing models can be removed in a matter of seconds.

Using a SaaS solution to break down traditional barriers that currently slow the adoption of predictive analytics, our strategy translates predictive models into operational assets with minimal deployment costs and leverages the inherent scalability of utility computing.

In summary, ADAPA revolutionizes the world of predictive analytics, since it allows for:

  • Cost-effective and reliable service based on Amazon’s EC2 infrastructure
  • Secure execution of predictive models through dedicated and controlled instances including HTTPS and Web-Services security
  • On-demand computing. Choice of instance type (small, large, and extra-large) and launch of multiple instances.
  • Superior time-to-market by providing rapid deployment of predictive models and an agile enterprise decision management environment.

Comprehensive blog featuring topics related to predictive analytics with an emphasis on open standards, Predictive Model Markup Language (PMML), cloud computing, as well as the deployment and integration of predictive models in any business process.

Link to original post