The primary goal of this project is to predict low credit score customers based on various features from a dataset of credit transactions. The project aims to develop a classification model capable of identifying customers with low credit score.
This project is divided into three main parts, aligning with the mid-course project requirements for the Maven Analytics Classification course. Each part focuses on different aspects of the data preparation, model development, and evaluation process.
Step 1: Data Preparation and Exploratory Data Analysis (EDA)
Data Import and Conversion:
- Load the CSV file containing credit transaction data.
- Perform necessary datatype conversions to ensure consistency and accuracy.
Target Variable Modification:
- Modify the target variable by grouping 'Standard' and 'Good' credit scores together, creating a binary classification problem (Low vs. High).
Data Exploration:
- Analyse the dataset to identify which features most significantly impact credit scores.
- Check for and address any high correlations between features.
- Remove unnecessary features that do not contribute to the predictive power of the model.
Data Preparation for Modelling:
- Create dummy variables for categorical features.
- Split the data into training and testing sets.
- Scale features if necessary to ensure model stability and performance.
Step 2: Logistic Regression
Initial Model Fitting:
- Fit a Logistic Regression model using default hyperparameters.
Hyperparameter Tuning:
- Tune the hyperparameters to optimize the model's performance.
Performance Reporting:
- Report key metrics: accuracy, precision, recall, and F1 score.
- Adjust the decision threshold to maximize the F1 score.
ROC Curve and AUC:
- Plot the ROC curve for the tuned model.
- Calculate and report the Area Under the Curve (AUC) to evaluate the model's ability to distinguish between classes.
Step 3: Addressing Imbalanced Data
SMOTE Application:
- Apply Synthetic Minority Over-sampling Technique (SMOTE) to balance the dataset by resampling to an equal number of instances for both classes.
Model Re-tuning:
- Tune the model threshold again after applying SMOTE to check for improvements.
Performance Comparison:
- Compare the model's performance (accuracy, F1 score, and AUC) before and after applying SMOTE to assess the impact of handling data imbalance.
Final Model Evaluation
- Fit the final model using the best-performing configuration and techniques identified through the project.
- Evaluate the final model's performance on the test data to ensure its generalizability and reliability.
By following these steps, this project aims to build a classification model capable for predicting low credit score customers, providing insights and tools for managing credit risk.