Here are some key insights and the final submission details from my journey:
Key Insights:
- Data Preprocessing:
- Handling Missing Values: Implemented strategies to fill missing values in the
fuel_type
and engine
columns using a custom function that mapped the best estimates based on available data.
- Feature Engineering: Calculated the age of the cars by subtracting the
model_year
from the current year and created a new feature log_price
by applying a logarithmic transformation to the price
column for better model performance.
- Data Cleaning:
- Transmission Types: Segregated data into Automatic (A/T) and Manual (M/T) transmissions for better categorization.
- Accident Data: Mapped accident data into binary values to indicate whether the car was involved in an accident or not.
- Exploratory Data Analysis:
- Visualizations: Utilized seaborn and matplotlib to create pair plots, histograms, and regression plots to understand the relationships between features and the target variable.
- Correlation Analysis: Generated heatmaps to identify correlations between numerical features.
- Model Building:
- Pipeline Creation: Built a pipeline using
ColumnTransformer
for preprocessing and RandomForestRegressor
for regression.
- Model Training: Trained the model on a sample of the training data to optimize performance and reduce computational load.
- Evaluation: Achieved an RMSE (Root Mean Squared Error) of 51355.79 on the sample data, indicating the model's predictive accuracy.
- Final Submission:
- Predictions: Applied the trained model to the test data and generated predictions for car prices.
- Submission: Prepared the final submission file with predicted prices for 125,165 cars.
Final Submission Snapshot:
idprice037625.60123900.31227846.07......12516440589.18
Tools & Libraries Used:
- Python: For data processing and model building.
- Pandas & NumPy: For data manipulation and numerical computations.
- Seaborn & Matplotlib: For data visualization.
- Scikit-Learn: For machine learning model implementation.
Participating in this competition was a fantastic learning experience, and I am excited to apply these insights to future projects. A big thank you to the Kaggle community for providing such a valuable platform for learning and growth.
Feel free to connect with me if you have any questions or would like to discuss more about this project!