Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics and truck accident claims
    How Data Analytics Reduces Truck Accidents and Speeds Up Claims
    7 Min Read
    predictive analytics for interior designers
    Interior Designers Boost Profits with Predictive Analytics
    8 Min Read
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Building Neural Networks on Unbalanced Data (using Clementine)
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > Building Neural Networks on Unbalanced Data (using Clementine)
Data Mining

Building Neural Networks on Unbalanced Data (using Clementine)

TimManns
TimManns
6 Min Read
SHARE

I got a ton of ideas whilst attending the Teradata Partners conference and also Predictive Analytics World. I think my presentations went down well (well, I got good feedback). There were also a few questions and issues that were posed to me. One issue raised by Dean Abbott was regarding building neural networks on unbalanced data in Clementine.

Rightly so, Dean pointed out that the building of neurals nets can actually work perfectly fine against unbalanced data. The problem is that when the Neural Net determines a categorical outcome it must know the incidence (probability) of that outcome. By default, Clementine will simply take the output neuron values, and if the value is above 0.5 the prediction will be true, else if the output neuron value is below 0.5 the category outcome will be false. This is why in Clementine you need to balance categorical outcome to roughly 50%/50% when you build the neural net model. In the case of multiple categorical values it is the highest output neuron value which becomes the prediction.

But there is a simple solution!

It is something I have always done out of habit because it has proved to …

More Read

Experience vs. Data: Consuming Mark Zuckerberg as Data
Listening to the Many Voices
Everyware: The dawning age of ubiquitous computing | A book by…
Predictive Analytics: 8 Things to Keep in Mind (Part 7)
Warranty Management – New rules to apply



I got a ton of ideas whilst attending the Teradata Partners conference and also Predictive Analytics World. I think my presentations went down well (well, I got good feedback). There were also a few questions and issues that were posed to me. One issue raised by Dean Abbott was regarding building neural networks on unbalanced data in Clementine.

Rightly so, Dean pointed out that the building of neurals nets can actually work perfectly fine against unbalanced data. The problem is that when the Neural Net determines a categorical outcome it must know the incidence (probability) of that outcome. By default, Clementine will simply take the output neuron values, and if the value is above 0.5 the prediction will be true, else if the output neuron value is below 0.5 the category outcome will be false. This is why in Clementine you need to balance categorical outcome to roughly 50%/50% when you build the neural net model. In the case of multiple categorical values it is the highest output neuron value which becomes the prediction.

But there is a simple solution!

It is something I have always done out of habit because it has proved to generate better models, and I find a decimal score more useful. Being a cautious individual (and at the time a bit jet lagged) I wanted to double check first, but simply by converting a categorical outcome into a numeric range you will avoid this problem.

In situations where you have a binary categorical outcome (say, churn yes/no or response yes/no) then in Clementine you can use a Derive (flag) node to create alternative outcome values. In a Derive (flag) node simply change the true outcome to 1.0 and the false outcome to 0.0. 

By changing the categorical outcome values to a decimal range outcome between 0.0 and 1.0, the Neural Network model will instead expose the output neuron values and the Clementine output score will be a decimal range from 0.0 to 1.0. The distribution of this score should also closely match the probability of the data input into the model during building. In my analysis I cannot use all the data because I have too many records, but I often build models on fairly unbalanced data and simply use the score sorted/ranked to determine which customers to contact first. I subsequently use the lift metric and the incidence of actual outcomes in sub-populations of predicted high scoring customers. I rarely try to create a categorical ‘true’ or ‘false’ outcome, so didn’t give it much thought until now.

If you want to create an incidence matrix that simply shows how many ‘true’ or false’ outcomes the model achieves, then instead of using the Neural Net score of 0.5 to determine the true or false outcome, you simply use the probability of the outcome used to build the model. For example, if I *build* my neural net using data balanced as 250,000 false outcomes and 10,000 true outcomes, then my cut-off neural network score should be 0.04. If my neural network score exceeds 0.04, then I predict true; else if my neural network score is below 0.04, then I predict false. A simple derive node can be used to do this.

If you have a categorical output with multiple values (say, 5 products or 7 spend bands), then you can use a Set-To-Flag node in a similar way to create many new fields, each with a value of either 0.0 or 1.0. Make *all* new set-to-flag fields outputs and the Neural Network will create a decimal score for each output field. This is essential exposing the raw output neuron values, which you can then use in many ways similar to above (or use all output scores in a rough ‘fuzzy’ logic way as I have in the past:).

I posted a small example stream on the kdkeys Clementine forum http://www.kdkeys.net/forums/70/ShowForum.aspx
http://www.kdkeys.net/forums/thread/9347.aspx

Just change the file suffix from .zip to .str and open the Clementine stream file. Created using version 12.0, but should work in some older versions.
http://www.kdkeys.net/forums/9347/PostAttachment.aspx

I hope this makes sense. Free feel to post a comment if elboration is needed!

 – enjoy!

Link to original post

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

data analytics and truck accident claims
How Data Analytics Reduces Truck Accidents and Speeds Up Claims
Analytics Big Data Exclusive
predictive analytics for interior designers
Interior Designers Boost Profits with Predictive Analytics
Analytics Exclusive Predictive Analytics
big data and cybercrime
Stopping Lateral Movement in a Data-Heavy, Edge-First World
Big Data Exclusive
AI and data mining
What the Rise of AI Web Scrapers Means for Data Teams
Artificial Intelligence Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

On Text Analytics vs Machine Translation

4 Min Read

Attaining Sustainable Growth through Corporate Social…

1 Min Read

Relational Databases Get a Hard Time

1 Min Read

Creating an Event Based Marketing (EBM) Plan

3 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence
ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?