Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    unusual trading activity
    Signal Or Noise? A Decision Tree For Evaluating Unusual Trading Activity
    3 Min Read
    software developer using ai
    How Data Analytics Helps Developers Deliver Better Tech Services
    8 Min Read
    ai for stock trading
    Can Data Analytics Help Investors Outperform Warren Buffett
    9 Min Read
    media monitoring
    Signals In The Noise: Using Media Monitoring To Manage Negative Publicity
    5 Min Read
    data analytics
    How Data Analytics Can Help You Construct A Financial Weather Map
    4 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Building Neural Networks on Unbalanced Data (using Clementine)
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > Building Neural Networks on Unbalanced Data (using Clementine)
Data Mining

Building Neural Networks on Unbalanced Data (using Clementine)

TimManns
TimManns
6 Min Read
SHARE

I got a ton of ideas whilst attending the Teradata Partners conference and also Predictive Analytics World. I think my presentations went down well (well, I got good feedback). There were also a few questions and issues that were posed to me. One issue raised by Dean Abbott was regarding building neural networks on unbalanced data in Clementine.

Rightly so, Dean pointed out that the building of neurals nets can actually work perfectly fine against unbalanced data. The problem is that when the Neural Net determines a categorical outcome it must know the incidence (probability) of that outcome. By default, Clementine will simply take the output neuron values, and if the value is above 0.5 the prediction will be true, else if the output neuron value is below 0.5 the category outcome will be false. This is why in Clementine you need to balance categorical outcome to roughly 50%/50% when you build the neural net model. In the case of multiple categorical values it is the highest output neuron value which becomes the prediction.

But there is a simple solution!

It is something I have always done out of habit because it has proved to …

More Read

CRAN R 2.9.0 now available
ISO TC 184/SC 4 Conference in Canada
Can We Learn From Anti-Social Users?
How to Measure Emotions in Branding and Advertising Research
5 Steps To Winning with Analytics: Have a plan



I got a ton of ideas whilst attending the Teradata Partners conference and also Predictive Analytics World. I think my presentations went down well (well, I got good feedback). There were also a few questions and issues that were posed to me. One issue raised by Dean Abbott was regarding building neural networks on unbalanced data in Clementine.

Rightly so, Dean pointed out that the building of neurals nets can actually work perfectly fine against unbalanced data. The problem is that when the Neural Net determines a categorical outcome it must know the incidence (probability) of that outcome. By default, Clementine will simply take the output neuron values, and if the value is above 0.5 the prediction will be true, else if the output neuron value is below 0.5 the category outcome will be false. This is why in Clementine you need to balance categorical outcome to roughly 50%/50% when you build the neural net model. In the case of multiple categorical values it is the highest output neuron value which becomes the prediction.

But there is a simple solution!

It is something I have always done out of habit because it has proved to generate better models, and I find a decimal score more useful. Being a cautious individual (and at the time a bit jet lagged) I wanted to double check first, but simply by converting a categorical outcome into a numeric range you will avoid this problem.

In situations where you have a binary categorical outcome (say, churn yes/no or response yes/no) then in Clementine you can use a Derive (flag) node to create alternative outcome values. In a Derive (flag) node simply change the true outcome to 1.0 and the false outcome to 0.0. 

By changing the categorical outcome values to a decimal range outcome between 0.0 and 1.0, the Neural Network model will instead expose the output neuron values and the Clementine output score will be a decimal range from 0.0 to 1.0. The distribution of this score should also closely match the probability of the data input into the model during building. In my analysis I cannot use all the data because I have too many records, but I often build models on fairly unbalanced data and simply use the score sorted/ranked to determine which customers to contact first. I subsequently use the lift metric and the incidence of actual outcomes in sub-populations of predicted high scoring customers. I rarely try to create a categorical ‘true’ or ‘false’ outcome, so didn’t give it much thought until now.

If you want to create an incidence matrix that simply shows how many ‘true’ or false’ outcomes the model achieves, then instead of using the Neural Net score of 0.5 to determine the true or false outcome, you simply use the probability of the outcome used to build the model. For example, if I *build* my neural net using data balanced as 250,000 false outcomes and 10,000 true outcomes, then my cut-off neural network score should be 0.04. If my neural network score exceeds 0.04, then I predict true; else if my neural network score is below 0.04, then I predict false. A simple derive node can be used to do this.

If you have a categorical output with multiple values (say, 5 products or 7 spend bands), then you can use a Set-To-Flag node in a similar way to create many new fields, each with a value of either 0.0 or 1.0. Make *all* new set-to-flag fields outputs and the Neural Network will create a decimal score for each output field. This is essential exposing the raw output neuron values, which you can then use in many ways similar to above (or use all output scores in a rough ‘fuzzy’ logic way as I have in the past:).

I posted a small example stream on the kdkeys Clementine forum http://www.kdkeys.net/forums/70/ShowForum.aspx
http://www.kdkeys.net/forums/thread/9347.aspx

Just change the file suffix from .zip to .str and open the Clementine stream file. Created using version 12.0, but should work in some older versions.
http://www.kdkeys.net/forums/9347/PostAttachment.aspx

I hope this makes sense. Free feel to post a comment if elboration is needed!

 – enjoy!

Link to original post

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

ai driven task management
Reducing “Work About Work” with AI Task Managers
Artificial Intelligence Exclusive
data center uptime
Why Rodent-Resistant Conduits Are Critical for Data Center Uptime
Big Data Data Management Exclusive Risk Management
big data and AI
The Intersection of Big Data and AI in Project Management
Artificial Intelligence Big Data Exclusive
data migration risk prevention
Best Approach to Risk Management for Data Migration in Data-Driven Businesses
Big Data Data Management Exclusive Risk Management

Stay Connected

1.2KFollowersLike
33.7KFollowersFollow
222FollowersPin

You Might also Like

Blogs are Dead!?! – Not Among Gen Y

3 Min Read

Images to Data :OCR Softwares

3 Min Read

Blogging from the Gartner BI Summit: Day 2

5 Min Read

Data, Data Everywhere

2 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive
AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?