Credit Score Logistic Regression

About this project

The primary goal of this project is to predict low credit score customers based on various features from a dataset of credit transactions. The project aims to develop a classification model capable of identifying customers with low credit score.

This project is divided into three main parts, aligning with the mid-course project requirements for the Maven Analytics Classification course. Each part focuses on different aspects of the data preparation, model development, and evaluation process.

Step 1: Data Preparation and Exploratory Data Analysis (EDA)

  1. Data Import and Conversion:

    • Load the CSV file containing credit transaction data.
    • Perform necessary datatype conversions to ensure consistency and accuracy.
  2. Target Variable Modification:

    • Modify the target variable by grouping 'Standard' and 'Good' credit scores together, creating a binary classification problem (Low vs. High).
  3. Data Exploration:

    • Analyse the dataset to identify which features most significantly impact credit scores.
    • Check for and address any high correlations between features.
    • Remove unnecessary features that do not contribute to the predictive power of the model.
  4. Data Preparation for Modelling:

    • Create dummy variables for categorical features.
    • Split the data into training and testing sets.
    • Scale features if necessary to ensure model stability and performance.

Step 2: Logistic Regression

  1. Initial Model Fitting:

    • Fit a Logistic Regression model using default hyperparameters.
  2. Hyperparameter Tuning:

    • Tune the hyperparameters to optimize the model's performance.
  3. Performance Reporting:

    • Report key metrics: accuracy, precision, recall, and F1 score.
    • Adjust the decision threshold to maximize the F1 score.
  4. ROC Curve and AUC:

    • Plot the ROC curve for the tuned model.
    • Calculate and report the Area Under the Curve (AUC) to evaluate the model's ability to distinguish between classes.

Step 3: Addressing Imbalanced Data

  1. SMOTE Application:

    • Apply Synthetic Minority Over-sampling Technique (SMOTE) to balance the dataset by resampling to an equal number of instances for both classes.
  2. Model Re-tuning:

    • Tune the model threshold again after applying SMOTE to check for improvements.
  3. Performance Comparison:

    • Compare the model's performance (accuracy, F1 score, and AUC) before and after applying SMOTE to assess the impact of handling data imbalance.

Final Model Evaluation

  • Fit the final model using the best-performing configuration and techniques identified through the project.
  • Evaluate the final model's performance on the test data to ensure its generalizability and reliability.

By following these steps, this project aims to build a classification model capable for predicting low credit score customers, providing insights and tools for managing credit risk.

