__STYLES__

Car Predictions_Kaggle Competition

Tools used in this project
Car Predictions_Kaggle Competition

Python Juypter Notebook

About this project

Here are some key insights and the final submission details from my journey:

Key Insights:

  1. Data Preprocessing:
    • Handling Missing Values: Implemented strategies to fill missing values in the fuel_type and engine columns using a custom function that mapped the best estimates based on available data.
    • Feature Engineering: Calculated the age of the cars by subtracting the model_year from the current year and created a new feature log_price by applying a logarithmic transformation to the price column for better model performance.
  2. Data Cleaning:
    • Transmission Types: Segregated data into Automatic (A/T) and Manual (M/T) transmissions for better categorization.
    • Accident Data: Mapped accident data into binary values to indicate whether the car was involved in an accident or not.
  3. Exploratory Data Analysis:
    • Visualizations: Utilized seaborn and matplotlib to create pair plots, histograms, and regression plots to understand the relationships between features and the target variable.
    • Correlation Analysis: Generated heatmaps to identify correlations between numerical features.
  4. Model Building:
    • Pipeline Creation: Built a pipeline using ColumnTransformer for preprocessing and RandomForestRegressor for regression.
    • Model Training: Trained the model on a sample of the training data to optimize performance and reduce computational load.
    • Evaluation: Achieved an RMSE (Root Mean Squared Error) of 51355.79 on the sample data, indicating the model's predictive accuracy.
  5. Final Submission:
    • Predictions: Applied the trained model to the test data and generated predictions for car prices.
    • Submission: Prepared the final submission file with predicted prices for 125,165 cars.

Final Submission Snapshot:

idprice037625.60123900.31227846.07......12516440589.18

Tools & Libraries Used:

  • Python: For data processing and model building.
  • Pandas & NumPy: For data manipulation and numerical computations.
  • Seaborn & Matplotlib: For data visualization.
  • Scikit-Learn: For machine learning model implementation.

Participating in this competition was a fantastic learning experience, and I am excited to apply these insights to future projects. A big thank you to the Kaggle community for providing such a valuable platform for learning and growth.

Feel free to connect with me if you have any questions or would like to discuss more about this project!

Discussion and feedback(0 comments)
2000 characters remaining
Cookie SettingsWe use cookies to enhance your experience, analyze site traffic and deliver personalized content. Read our Privacy Policy.