By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    data Analytics instagram stories
    Data Analytics Helps Marketers Make the Most of Instagram Stories
    15 Min Read
    analyst,women,looking,at,kpi,data,on,computer,screen
    What to Know Before Recruiting an Analyst to Handle Company Data
    6 Min Read
    AI analytics
    AI-Based Analytics Are Changing the Future of Credit Cards
    6 Min Read
    data overload showing data analytics
    How Does Next-Gen SIEM Prevent Data Overload For Security Analysts?
    8 Min Read
    hire a marketing agency with a background in data analytics
    5 Reasons to Hire a Marketing Agency that Knows Data Analytics
    7 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: Building Neural Networks on Unbalanced Data (using Clementine)
Share
Notification Show More
Aa
SmartData CollectiveSmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > Building Neural Networks on Unbalanced Data (using Clementine)
Data Mining

Building Neural Networks on Unbalanced Data (using Clementine)

TimManns
Last updated: 2009/11/01 at 11:49 PM
TimManns
6 Min Read
SHARE

I got a ton of ideas whilst attending the Teradata Partners conference and also Predictive Analytics World. I think my presentations went down well (well, I got good feedback). There were also a few questions and issues that were posed to me. One issue raised by Dean Abbott was regarding building neural networks on unbalanced data in Clementine.

Rightly so, Dean pointed out that the building of neurals nets can actually work perfectly fine against unbalanced data. The problem is that when the Neural Net determines a categorical outcome it must know the incidence (probability) of that outcome. By default, Clementine will simply take the output neuron values, and if the value is above 0.5 the prediction will be true, else if the output neuron value is below 0.5 the category outcome will be false. This is why in Clementine you need to balance categorical outcome to roughly 50%/50% when you build the neural net model. In the case of multiple categorical values it is the highest output neuron value which becomes the prediction.

But there is a simple solution!

It is something I have always done out of habit because it has proved to …

More Read

data mining

Data Mining Technology Helps Online Brands Optimize Their Branding

Can Data Mining Aid with Off-Page SEO Strategies?
3 Data Mining Tips for Companies Trying to Understand their Customers
5 Data Mining Tips to Leverage the Benefits of Surveys
Perform Data Mining With Web Scrapers to Track Prices



I got a ton of ideas whilst attending the Teradata Partners conference and also Predictive Analytics World. I think my presentations went down well (well, I got good feedback). There were also a few questions and issues that were posed to me. One issue raised by Dean Abbott was regarding building neural networks on unbalanced data in Clementine.

Rightly so, Dean pointed out that the building of neurals nets can actually work perfectly fine against unbalanced data. The problem is that when the Neural Net determines a categorical outcome it must know the incidence (probability) of that outcome. By default, Clementine will simply take the output neuron values, and if the value is above 0.5 the prediction will be true, else if the output neuron value is below 0.5 the category outcome will be false. This is why in Clementine you need to balance categorical outcome to roughly 50%/50% when you build the neural net model. In the case of multiple categorical values it is the highest output neuron value which becomes the prediction.

But there is a simple solution!

It is something I have always done out of habit because it has proved to generate better models, and I find a decimal score more useful. Being a cautious individual (and at the time a bit jet lagged) I wanted to double check first, but simply by converting a categorical outcome into a numeric range you will avoid this problem.

In situations where you have a binary categorical outcome (say, churn yes/no or response yes/no) then in Clementine you can use a Derive (flag) node to create alternative outcome values. In a Derive (flag) node simply change the true outcome to 1.0 and the false outcome to 0.0. 

By changing the categorical outcome values to a decimal range outcome between 0.0 and 1.0, the Neural Network model will instead expose the output neuron values and the Clementine output score will be a decimal range from 0.0 to 1.0. The distribution of this score should also closely match the probability of the data input into the model during building. In my analysis I cannot use all the data because I have too many records, but I often build models on fairly unbalanced data and simply use the score sorted/ranked to determine which customers to contact first. I subsequently use the lift metric and the incidence of actual outcomes in sub-populations of predicted high scoring customers. I rarely try to create a categorical ‘true’ or ‘false’ outcome, so didn’t give it much thought until now.

If you want to create an incidence matrix that simply shows how many ‘true’ or false’ outcomes the model achieves, then instead of using the Neural Net score of 0.5 to determine the true or false outcome, you simply use the probability of the outcome used to build the model. For example, if I *build* my neural net using data balanced as 250,000 false outcomes and 10,000 true outcomes, then my cut-off neural network score should be 0.04. If my neural network score exceeds 0.04, then I predict true; else if my neural network score is below 0.04, then I predict false. A simple derive node can be used to do this.

If you have a categorical output with multiple values (say, 5 products or 7 spend bands), then you can use a Set-To-Flag node in a similar way to create many new fields, each with a value of either 0.0 or 1.0. Make *all* new set-to-flag fields outputs and the Neural Network will create a decimal score for each output field. This is essential exposing the raw output neuron values, which you can then use in many ways similar to above (or use all output scores in a rough ‘fuzzy’ logic way as I have in the past:).

I posted a small example stream on the kdkeys Clementine forum http://www.kdkeys.net/forums/70/ShowForum.aspx
http://www.kdkeys.net/forums/thread/9347.aspx

Just change the file suffix from .zip to .str and open the Clementine stream file. Created using version 12.0, but should work in some older versions.
http://www.kdkeys.net/forums/9347/PostAttachment.aspx

I hope this makes sense. Free feel to post a comment if elboration is needed!

 – enjoy!

Link to original post

TimManns November 1, 2009
Share This Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

ai low code frameworks
AI Can Help Accelerate Development with Low-Code Frameworks
Artificial Intelligence
data Analytics instagram stories
Data Analytics Helps Marketers Make the Most of Instagram Stories
Analytics
data breaches
How Hospital Security Breaches Devastate Local Communities
Policy and Governance
analyst,women,looking,at,kpi,data,on,computer,screen
What to Know Before Recruiting an Analyst to Handle Company Data
Analytics

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

data mining
Data Mining

Data Mining Technology Helps Online Brands Optimize Their Branding

7 Min Read
data mining helps with offsite SEO
Data Mining

Can Data Mining Aid with Off-Page SEO Strategies?

10 Min Read
using data mining to learn more about customers
Big Data

3 Data Mining Tips for Companies Trying to Understand their Customers

6 Min Read
surveys data
Data Mining

5 Data Mining Tips to Leverage the Benefits of Surveys

11 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive
AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?