I’ll show you mine if you show me yours…

September 14, 2009
58 Views

Analysts don’t usually quote predictive model performance. Data Mining within each industry is different, and even within the telecommunications industry definitions of churn are inconsistent. This often makes reported outcomes tricky to fully understand.

I decided to post some churn model outcomes after reading a post by the enigmatic Zyxo on his (or maybe her :)) blog ;http://zyxo.wordpress.com/2009/08/29/data-mining-for-marketing-campaigns-interpretation-of-lift/

I’d like to know if the models rate well 🙂

I’d love to see reports of the performance of any predictive classification models (anything like churn models) you’ve been working on, but I realize that is unlikely… For like-minded data miners a simple lift chart might suffice.

The availability of data will greatly influence your ability to identify and predict churn (for the purpose of this post churn is defined as when good fare paying customers voluntarily leave). In this case churn outcome incidence is approx 0.5% per month, where the total population shown in each chart is a few million.

Below are two pictures of recent churn model Lift charts I built. Both models use the previous three months call summary data and the .


Analysts don’t usually quote predictive model performance. Data Mining within each industry is different, and even within the telecommunications industry definitions of churn are inconsistent. This often makes reported outcomes tricky to fully understand.

I decided to post some churn model outcomes after reading a post by the enigmatic Zyxo on his (or maybe her :)) blog ;http://zyxo.wordpress.com/2009/08/29/data-mining-for-marketing-campaigns-interpretation-of-lift/

I’d like to know if the models rate well 🙂

I’d love to see reports of the performance of any predictive classification models (anything like churn models) you’ve been working on, but I realize that is unlikely… For like-minded data miners a simple lift chart might suffice.

The availability of data will greatly influence your ability to identify and predict churn (for the purpose of this post churn is defined as when good fare paying customers voluntarily leave). In this case churn outcome incidence is approx 0.5% per month, where the total population shown in each chart is a few million.

Below are two pictures of recent churn model Lift charts I built. Both models use the previous three months call summary data and the previous month’s social group analysis data to predict a churn event occurring in the subsequent month. Models are validated against real unseen historical data.

I’m assuming you know what a lift chart is. Basically, it shows the magnitude increase in the proportions of your target outcome (in this case churn) within small sub-groups of your total population. Sub-groups are rank/sorted by propensity. For example, in the first chart we obtain 10 times more churn in the top 1% of our customers we suspected of churning using the predictive model.

The first model is built for a customer base of prepaid (purchase recharge credit prior to use) mobile customers, where the main sources of data are usage and social network analysis.

The second model is postpaid (usage is subsequently billed to customer) mobile customers, where contract information and billing are additionally available. Obviously contracts commit customers for specified periods of time, so act as very ‘predictive’ inputs for any model.

– first churn model lift

– second churn model lift

Both charts show our model lift in blue and the best possible result in dotted red. For the first model we are obtaining a lift of approximately 6 or 7 for the top 5% population (where the best possibly outcome would be 20 (eg. (100 / 5) = 20).

The second model is significantly better, with our model able to obtain a lift of approximately 10 for the top 5% of population (half way to perfection 🙂

I mention lift at 5% population because this gives us the reasonable mailing size and catches a large number of subsequent churners.

Obviously I can’t discuss the analysis itself in any depth. I’m just curious what the first impressions are of the lift. I think its good, but I could be delusional! And just to confirm, it is real and validated against unseen data.

– enjoy!

Link to original post

You may be interested

How SAP Hana is Driving Big Data Startups
Big Data
298 shares2,919 views
Big Data
298 shares2,919 views

How SAP Hana is Driving Big Data Startups

Ryan Kh - July 20, 2017

The first version of SAP Hana was released in 2010, before Hadoop and other big data extraction tools were introduced.…

Data Erasing Software vs Physical Destruction: Sustainable Way of Data Deletion
Data Management
44 views
Data Management
44 views

Data Erasing Software vs Physical Destruction: Sustainable Way of Data Deletion

Manish Bhickta - July 20, 2017

Physical Data destruction techniques are efficient enough to destroy data, but they can never be considered eco-friendly. On the other…

10 Simple Rules for Creating a Good Data Management Plan
Data Management
69 shares627 views
Data Management
69 shares627 views

10 Simple Rules for Creating a Good Data Management Plan

GloriaKopp - July 20, 2017

Part of business planning is arranging how data will be used in the development of a project. This is why…