__STYLES__
The main goal of this analysis was to process and analyze a loan dataset, extract key insights, and perform several calculations related to loan status, repayment, and customer behavior. I also generated a visualization to illustrate the distribution of loan statuses.
First Name
and Last Name
columns into a single column Full Name
. Afterward, I dropped the original name columns, as they were no longer necessary.df['Full Name'] = df['First Name'] + ' ' + df['Last Name'] df.drop(['First Name', 'Last Name'], axis=1, inplace=True)
Released
(loan release date) and Maturity
(loan maturity date). These were converted to the appropriate datetime
format to allow for date-based calculations such as loan tenor.df['Released'] = pd.to_datetime(df['Released']) df['Maturity'] = pd.to_datetime(df['Maturity'])
Principal
, Due
, Monthly repayment
, Balance
) were converted to numeric data types. I used the pd.to_numeric()
function to ensure any non-numeric entries (such as strings or erroneous values) were coerced into NaN, allowing for consistent numerical calculations.df['Principal'] = pd.to_numeric(df['Principal'], errors='coerce') df['Due'] = pd.to_numeric(df['Due'], errors='coerce') df['Monthly repayment'] = pd.to_numeric(df['Monthly repayment'], errors='coerce') df['Balance'] = pd.to_numeric(df['Balance'], errors='coerce')
Loan Status
, to categorize loans as either "Fully Paid" or "Open". This was determined by checking whether the Balance
was zero. If the balance was zero, the loan was considered "Fully Paid", otherwise it was classified as "Open".df['Loan status'] = df.apply(lambda row: 'Fully Paid' if row['Balance'] == 0 else 'Open', axis=1)
Maturity
and Released
dates. I first calculated the difference in days and then divided by 30 to get an approximate number of months.df['Loan Tenor (Months)'] = (df['Maturity'] - df['Released']).dt.days // 30
Balance
from the Principal
. If the Balance
was zero, the entire principal was considered fully paid.df['Principal Paid'] = df['Principal'] - df['Balance'] df['Principal Paid'] = df['Principal Paid'].fillna(0)
I calculated several aggregated metrics that summarize the financial and customer behavior insights within the dataset:
Customer ID
to find how many customers took out more than one loan. This allows us to identify repeat customers.loan_counts = df.groupby('Customer ID')['LoanID'].count() customers_with_multiple_loans = loan_counts[loan_counts > 1].count()
total_principal_given = df['Principal'].sum()
total_due = df['Due'].sum()
total_monthly_repayment = df['Monthly repayment'].sum()
Fully Paid
vs. Open
loans), I plotted a bar chart using Matplotlib. This chart provides a quick overview of how many loans have been fully paid versus those that are still open.status_counts = df['Loan status'].value_counts() status_counts.plot(kind='bar', color=['blue', 'green']) plt.title('Count of Loan Status') plt.xlabel('Loan Status') plt.ylabel('Count') plt.xticks(rotation=0) plt.savefig('loan_status_chart.png') # Save the chart as a PNG file plt.show()
Full Name
, Loan Status
, Loan Tenor
, and Principal Paid
), to an Excel file named loan_analysis_results.xlsx
. A separate sheet was also included, summarizing the key aggregated metrics such as the total principal given, the number of customers with multiple loans, and other relevant statistics.with pd.ExcelWriter('loan_analysis_results.xlsx') as writer: df.to_excel(writer, sheet_name='Loan Data', index=False) summary_df = pd.DataFrame({ 'Metric': ['Number of Customers with Multiple Loans', 'Total Principal Given', 'Total Due', 'Total Monthly Repayment'], 'Value': [customers_with_multiple_loans, total_principal_given, total_due, total_monthly_repayment] }) summary_df.to_excel(writer, sheet_name='Summary', index=False)
This report provides an in-depth overview of the work completed, the insights derived, and future considerations. Let me know if there are any further details you’d like to explore!