PMML and Open Source Data Mining – Predictive Analytics on the go!

November 14, 2009
324 Views

Open source tools provide a cost-effective, yet powerful option for data mining. The following contenders adhere to the PMML standard that facilitates model exchange among open source and commercial vendors, providing a definitive route for production deployment of predictive models.

The R Project for Statistical Computing is definitely the most used and revered statistical package among advocates of open-source and community computing projects. Like the iPhone app store, you can basically find anything you need in CRAN (statistical that’s to say… yep, no navigation system for R), the Comprehensive R Archive Network. It is in CRAN that you will find the R PMML Package. This package allows R users to export PMML for a variety of models, including decision trees and neural networks (among many others). We recently co-authored an article with Graham Williams, the original author and maintainer of the package. It can be downloaded directly from The R Journal website. If you are interested in contributing code for the package, please contact us.

Developed by the University of Konstanz, KNIME is an open-source platform that enables users to visually



Open source tools provide a cost-effective, yet powerful option for
data mining. The following contenders adhere to the PMML standard that
facilitates model exchange among open source and commercial vendors,
providing a definitive route for production deployment of predictive
models.

The
R Project for Statistical Computing is definitely the most used and
revered statistical package among advocates of open-source and
community computing projects. Like the iPhone app store, you can
basically find anything you need in CRAN (statistical that’s to say…
yep, no navigation system for R), the Comprehensive R Archive Network.
It is in CRAN that you will find the R PMML Package. This package
allows R users to export PMML for a variety of models, including
decision trees and neural networks (among many others). We recently
co-authored an article with Graham Williams, the original author and
maintainer of the package. It can be downloaded directly from The R Journal website. If you are interested in contributing code for the package, please contact us.

Developed
by the University of Konstanz, KNIME is an open-source platform that
enables users to visually create and execute data flows. Since KNIME
2.0 (available as of December 2008), users can import and export PMML
models into and out of KNIME. Given that users can use R within KNIME,
the R PMML package can also be used to export and convert R models to
PMML within KNIME. New versions of KNIME will most certainly expand its
support for PMML even further.

Developed
by the University of Waikato, Weka provides a large collection of
machine learning algorithms for solving data mining problems. Although
Weka has currently no export functionality for PMML, Mark Hall is
currently working on implementing import functionally for PMML. Weka
can already import models such as regression, decision trees and neural
networks. PMML support in Weka is constantly expanding with the
addition of transformations and built-in functions.

Most
recently, Rapid-I announced that it will extend the latest version of
its RapidMiner software to include support for PMML. RapidMiner,
formerly known as YALE, is an open-source platform that offers
operators for all aspects of data mining. As with KNIME, Rapid-I is one
of the latest companies to join the rankings of the Data Mining Group
(DMG) beside companies like IBM, Microstrategy, SPSS, SAS and Zementis.
The DMG is already busy at work refining and adding yet more
capabilities and power to PMML.

Validating, Converting and Executing PMML

Zementis
offers a free PMML Converter through its website and the DMG (Data Mining Group). The PMML
converter tool, as its names suggests, converts older versions of PMML
(starting with version 2.1) to version 3.2. Besides conversion though,
it also makes sure submitted files are syntactically sound by
validating them against the PMML schema. The converter also corrects
known issues with files from different vendors. Once you have conducted
model development in your tool of choice, the Zementis
ADAPA engine provides a cost-effective, highly-scalable deployment
option for your PMML models.

PMML Discussion Forums

For an on-going discussion and to read about the latest PMML news, feel free to join the PMML group in LinkedIn or the discussion forum in the PMML group on Analytic Bridge,
a social network community for analytics professionals.