By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    football analytics
    The Role of Data Analytics in Football Performance
    9 Min Read
    data Analytics instagram stories
    Data Analytics Helps Marketers Make the Most of Instagram Stories
    15 Min Read
    analyst,women,looking,at,kpi,data,on,computer,screen
    What to Know Before Recruiting an Analyst to Handle Company Data
    6 Min Read
    AI analytics
    AI-Based Analytics Are Changing the Future of Credit Cards
    6 Min Read
    data overload showing data analytics
    How Does Next-Gen SIEM Prevent Data Overload For Security Analysts?
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: Examining PMML 4.0 – Part I: Pre-Processing
Share
Notification Show More
Aa
SmartData CollectiveSmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > Examining PMML 4.0 – Part I: Pre-Processing
Data Mining

Examining PMML 4.0 – Part I: Pre-Processing

MichaelZeller
Last updated: 2009/06/23 at 1:14 AM
MichaelZeller
7 Min Read
SHARE

You may be wondering what is all the fuss around PMML and its latest 4.0 version. So, we decided to explore all that PMML 4.0 has to offer in a series of blogs. In part I, we will be exploring its improved pre-processing capabilities.

All data mining models manipulate the raw data in a way or another before passing it through a a neural network, support vector machine, or regression model. Therefore, a language that wants to represent all the computations that go into a model needs also to be able to represent the data transformations that were applied to the raw data before scoring takes place. PMML is this language! It is the Yin and Yang of data mining.

Let’s first re-cap on the pre-processing capabilities available in PMML 3.2. This version of PMML allows for the following out of the box data transformations:

  • Normalization of continuous variables: this is accomplished via the NormContinuous element of PMML. It is mostly used to normalized a variable between 0 and 1. See example below (real PMML code) in which two variables are normalized. The first between 0 and 1 and the second between 0 and 4.

More Read

The Role of Standards in Predictive Analytics: A Series

ACM Data Mining Talk: Representing Solutions with PMML
The Netflix Prize, Occam’s Razor and PMML
In-database Scoring with PMML, Zementis, and Sybase IQ: Big Data Analytics Made Easy
With PMML, interoperability is truly attainable
  • Normalizing Categorical Inputs: normally used to transform strings into numerical…


You may be wondering what is all the fuss around PMML and its latest 4.0 version. So, we decided to explore all that PMML 4.0 has to offer in a series of blogs. In part I, we will be exploring its improved pre-processing capabilities.

All data mining models manipulate the raw data in a way or another before passing it through a a neural network, support vector machine, or regression model. Therefore, a language that wants to represent all the computations that go into a model needs also to be able to represent the data transformations that were applied to the raw data before scoring takes place. PMML is this language! It is the Yin and Yang of data mining.

Let’s first re-cap on the pre-processing capabilities available in PMML 3.2. This version of PMML allows for the following out of the box data transformations:

  • Normalization of continuous variables: this is accomplished via the NormContinuous element of PMML. It is mostly used to normalized a variable between 0 and 1. See example below (real PMML code) in which two variables are normalized. The first between 0 and 1 and the second between 0 and 4.

  • Normalizing Categorical Inputs: normally used to transform strings into numerical variables. This is accomplished by the element NormDiscrete. In the PMML example below, a categorical variable creates dummy variables that will be assigned values 1 or 0 depending on the category assumed by the input variable.
  • Discretization: this is used to transform continuous variables into strings. This is accomplished by the Discretize element. In the PMML example below, if the input variable is equal to 500, it is transformed to low; if equal to 5000, it is transformed to medium; and if 50,000, it is high.

  • Value Mapping: this is accomplished in PMML by the use of a mapping table and the element MapValues. To make things more interesting, in the PMML example below, we combine elements MapValues and NormDiscrete to group small sets of categorical values. In specific, we want to find out if the input variable belongs to a specific group of colors. We do that by using MapValues to map different colors to the same number. We then use the element NormDiscrete to create dummy variables which are used to indicate group membership.

  • Arithmetic Expressions: PMML offers a range of arithmetic functions (as well as string and date/time maniputation functions) that can be arranged in different ways to express complex arithmetic expressions. The example below solves the following operation:
ResultVar=maximum(round(InputVar1/3.3),2^(1+log(1.3*InputVar2+1)))

  • PMML 4.0 – Boolean Operations: Not only PMML 4.0 allows for Boolean operations to be fully expressed, but it also allows these to be nested into IF-THEN-ELSE logic. These new buit-in functions offer a vast new array of possibilites for representing data transformations in PMML. So, we devote the rest of this review by looking at transformations that can now be easily expressed in PMML 4.0.

We start with the PMML code below which implements the following logical and arithmetic operations:
IF InputVar1 == “Partner” THEN DerivedVar1 = “P” ELSE DerivedVar2 = 2 * InputVar2


Note that it uses the newly defined 4.0 functions: “if”, “equal”, and “not” as well as function “*”.

The PMML code below assumes that both “then” and “else” parts of the “if” use the same derived variable to implement the following operations:
IF InputVar1 == “Partner” THEN DerivedVar1 = “5.1 * InputVar2” ELSE DerivedVar1 = “InputVar2 / 3.3”

Finally, we end our list of PMML pre-processing examples by showing the use of 4.0 functions “isMissing” and “isIn” combined with function “if”. The PMML example below implements the following operations:
IF InputVar is missing THEN DerivedVar = 1 ELSE (IF InputVar is in (“Partner”, “Associate”, “Colleague”) THEN DerivedVar = 2 ELSE DerivedVar = 3)


We finish part I of our PMML tour hoping that this short description of its pre-processing capabilities can help you to easily navigate through all the data transformations now available in PMML 4.0.

Comprehensive blog featuring topics related to predictive analytics with an emphasis on open standards, Predictive Model Markup Language (PMML), cloud computing, as well as the deployment and integration of predictive models in any business process.

Link to original post

TAGGED: data mining models, pmml
MichaelZeller June 23, 2009
Share This Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

Shutterstock Licensed Photo - 1051059293 | Rawpixel.com
QR Codes Leverage the Benefits of Big Data in Education
Big Data
football analytics
The Role of Data Analytics in Football Performance
Analytics Big Data Exclusive
smart home data
7 Mind-Blowing Ways Smart Homes Use Data to Save Your Money
Big Data
ai low code frameworks
AI Can Help Accelerate Development with Low-Code Frameworks
Artificial Intelligence

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

The Role of Standards in Predictive Analytics: A Series

3 Min Read

ACM Data Mining Talk: Representing Solutions with PMML

2 Min Read

The Netflix Prize, Occam’s Razor and PMML

4 Min Read

In-database Scoring with PMML, Zementis, and Sybase IQ: Big Data Analytics Made Easy

4 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data
giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?