Summary:
In this project, I aim to clean, explore, and prepare a dataset for machine learning modelling to predict customer churn for a bank. The dataset is provided in a flat Excel file. The goal is to conduct data preparation and exploratory data analysis (EDA) in readiness for future machine learning tasks, but no prediction is performed in this phase.
Tools and Libraries:
I use Pandas, NumPy, Seaborn, and Matplotlib for data manipulation, exploratory data analysis (EDA), and visualisation. These Python libraries allow efficient data handling and detailed insights into the dataset.
Project Steps:
- Data Loading: I load the dataset from an Excel file using Pandas for further processing.
- Data Cleaning: I check for missing values, incorrect data types, and inconsistencies in the dataset. For instance, categorical variables like "Gender" and "Geography" are converted into numeric representations. I also address any outliers or nonsensical values (e.g., negative salaries) through imputation.
- Exploratory Data Analysis (EDA): Using Seaborn and Matplotlib, I create visualisations such as box plots, and bar charts to understand key relationships in the data. For example, I explore the impact of features like customer balance, tenure, and age on the likelihood of churn. These insights help identify important patterns and guide feature engineering.
- Feature Engineering: New features, such as the ratio of balance to estimated salary, are created to enhance the predictive power of the model. Categorical variables are transformed into numerical representations to ensure compatibility with machine learning algorithms.
- Data Preparation for Modelling: The cleaned and processed dataset is then structured in a format suitable for machine learning modelling, with target and feature variables ready for training and evaluation in future steps.
This project focuses on preparing the bank customer dataset for future machine learning tasks by ensuring the data is clean, organised, and well-understood through exploratory analysis.