__STYLES__
Introduction
Customer churning (or customer attrition rate) is a problem for any business in the service industry, you only make money by keeping customers interested in your product. In the financial service industry this usually takes the form of credit cards and so the more people that use their credit card service, the more money they will make. Being able to determine which customers are the most likely to drop their credit card and by extension, be able to reach out to those customers before they drop the card to fix their problem. This could give the bank a competitive advantage in the marketplace by keeping more customers using their credit card over their competitors.
Problem Statement
In this project, a bank manager is disturbed by the fact that more and more customers leaving their credit card services. We would try to tackle this problem by creating a supervised learning model that can predict whether a customer will churn or not. This model can help the manager to lower customers' churn rate by giving special attention to customers that are expected to churn.
Description of Data Source and Tools used
The dataset was acquired from Kaggle. It has 23 features consisting of customer profile and credit card usage. It has 10,127 rows. The tools that were used are Jupyter Notebook and Python.
Columns:
ClientNum - Unique identifier for the customer holding the account
Attrition_Flag - Whether or not the customer’s account has been closed
Customer_Age - Age of the customer in years
Gender - Gender of the customer (M or F)
Dependent_count - Number of dependents of the customer
Education_Level - Education level of the customer
Marital_Status - Marriage status of the customer
Income_Category - Annual income of the customer
Card_Category - Type of Card held by the customer (Blue, Silver, Gold, and Platinum)
Months_on_book - Number of months the customer has been with the bank
Total_Relationship_Count - Number of bank products owned by the customer
Months_Inactive_12_mon - Number of months on inactivity by the customer in the last 2 months
Contacts_Count_12_mon - Number of contacts by the customer in the last 12 months
Credit_Limit - Credit card limit of the customer
Total_Revolving_Bal - Total Revolving Balance on the customer's Credit Card
Avg_Open_To_Buy - Open to Buy Credit Line of the customer (Average of last 12 months)
Total_Amt_Chng_Q4_Q1 - Change in Transaction Amount of the customer (Q4 over Q1)
Total_Trans_Amt - Total Transaction Amount of the customer (Last 12 months)
Total_Trans_Ct - Total Transaction Count of the customer (Last 12 months)
Total_Ct_Chng_Q4_Q1 - Change in Transaction Count of the customer (Q4 over Q1)
Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_1
Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_2
Avg_Utilization_Ratio - Average customer Card Utilization Ratio
The following code snippet shows an overview of the dataset.Data Cleaning
Data Cleaning
Before we analyze the dataset, we have to ensure that our data is clean from duplicates, null values and has the correct data type. Let’s first check whether our dataset has a duplicates and null values or not. As we can see in the picture below, we don’t have any duplicates or null values.
Not all features are helpful in analysis which is why we will remove these useless features to simplify our next step. There are 3 features that I deemed useless that is:
Exploratory Data Analysis (EDA)
Before going to the machine learning section I will do a little bit of EDA first. EDA is a good start for analytics, it helps us discover patterns, check assumptions, and spot weird data points like outliers. I will not show all the charts and statistics of each feature, only some that I consider to be interesting or useful for business recommendations.
Distribution of numerical variables
-Customer_Age
-Dependent_count
-Months_on_book
-Total_Relationship_Count
-Months_Inactive_12_mon
-Contacts_Count_12_mon
-Credit_Limit
-Total_Revolving_Bal
-Avg_Open_To_Buy
-Total_Amt_Chng_Q4_Q1
-Total_Trans_Amt
-Total_Trans_Ct
-Total_Ct_Chng_Q4_Q1
-Avg_Utilization_Ratio
Box Plots
-Credit Limit – Age
-Credit Limit – Gender
-Credit Limit – Income
-Credit Limit – Marital Status
-Credit Limit – Card Category
- From the chart below, we can see that the majority of customers are either married or single.
- From the chart below, we can see that an overwhelming majority of customers use the banks "Blue" card
- From the chart below, we can see that the majority of customers earn less than $40k a year.
- Since the majority of the customer data we have is of existing customers, I will be using SMOTE to up sample the attrited samples to match them with the regular customer sample size to balance out the skewed data and thus, also helping to improve the performance of the later selected models.
Data Preprocessing
Before we create a machine learning model, first we have to pre-process the data. Preprocessing data is done so that the machine can read our data correctly. And since we don’t have any missing values as most data from Kaggle is already cleaned, we can proceed.
Feature Encoding
Feature encoding is done so that our machine can read our categorical data. So far, machines can only read numbers which is why we have to turn our categorical data into numerical data. There are 2 most common types of encoding:
Label encoding: I use label encoding when the feature has ordinal values for example in my dataset I have ‘Card_Category’, this feature has ordinal values where Blue is the lowest level and Platinum is the highest level. Label encoding can also be used for a categorical feature that only has 2 unique values.
One-hot encoding: I use one-hot encoding for other feature that doesn’t meet the criteria of label encoding. Be careful on using one-hot encoding, unlike label encoding, one-hot encoding will create new features/columns as much as the number of unique values. For example, if we have a feature that has 100 unique values that feature will be split into 100 new features/columns.
For this project, I used label encoding;
Modelling
Our data has already been preprocessed, which means now we can start our machine learning modeling.
Train Test Split
To make sure our data is not over fitted we will use a simple train test split with a 70:30 ratio.
Standardizing
Some of our features have different units and different ranges of values, for example, Credit_Limit have values of thousands of dollars while Customer_Age values are below 100, some algorithm would think that Credit_Limit have a bigger influence on the target over the Customer_Age, that is why we have to standardize all the features to make sure our model is not biased. Mind you not all algorithms are sensitive to this problem so you don’t need to standardize your features if you use a certain algorithm or library.
Oversampling
My dataset target is imbalanced, this happened a lot to fraud or churn datasets because there are not a lot of frauds or churned customers. To solve this problem we could use several methods, including:
Under sampling: This method will remove some of the majority class so that the ratio of majority to minority class is equal or at least not imbalanced (some people I know use a 2:1 threshold ratio)
Oversampling: This will multiply or duplicate some data from the minority class.
Oversampling with SMOTE: Similar to normal oversampling but instead of using the same data point from the minority class, SMOTE creates new synthetic data from the minority class.
For this project, I used oversampling with SMOTE.
Fitting and Model Evaluation
Three classifier models were used in this project:
Logistic regression: This is a supervised learning classification algorithm used to predict the probability of a target variable. The nature of target or dependent variable is dichotomous, which means there would be only two possible classes.
Decision Tree Classifier: This algorithm belongs to the family of supervised learning algorithms. Unlike other supervised learning algorithms, the decision tree algorithm can be used for solving regression and classification problems too. The goal of using a Decision Tree is to create a training model that can use to predict the class or value of the target variable by learning simple decision rules inferred from prior data (training data).
Random forest classifier: random decision forests is an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time. For classification tasks, the output of the random forest is the class selected by most trees. For regression tasks, the mean or average prediction of the individual trees is returned.
All the model's performance is really good but this only happens to synthetic data like ours, real data tend to have a bad result on the first try. I will pick Random Forest as our model because it has good recall. Recall is good when we want to minimize the false negative, different problems require different metrics evaluation so pick accordingly.
Feature importance
We have created a machine learning model, and from that model, we can find out which features have a high influence on attrition. According to the chosen models, the total transaction amount (Total_Trans_Amt) is a feature that has a high influence on attrition. Using this information we can reduce the chance of attrition by giving special attention to customers that have had a low amount of total transactions.
Recommendations
The amount of total transaction is a feature that is considered to be most influential to predict attrition
Platinum card users have the highest attrition rate at 25%. We need to analyze more of this product and maybe give more benefit to the Platinum users.
People with less than 40K income and who have a graduate degree is our potential customer. These two segments should be our priority in marketing.
Considering Revolving balance and the amount spent for each transaction, the coefficients indicate that the influence of these variables on the attrition probability is very low, but when needed to be assessed, customers to target are those with a low revolving balance and higher transactions amount
Conclusion
The aim of today’s research in the field of data science is to build systems and algorithms to extract knowledge from data. The results obtained above can be used as a standard point of reference for other projects done in the field of predicting whether a customer will churn or not. This project can further be used as a basis for improving the present classifiers and techniques resulting in making better technologies for accurately predicting customer attrition rate.
Here is a link to the Full Project Code on my GitHub