Project Overview:
This project is focused on identifying the best Halloween candies in terms of both popularity and affordability. Using data science techniques such as Principal Component Analysis (PCA) and K-Means clustering, I explore the various candy attributes (e.g., chocolate, fruity, nutty, bar form) to classify candies into meaningful groups. Additionally, a custom 'popular_vs_cheap' score is calculated to highlight the candies that are not only well-liked but also cost-effective. The final output is presented in a PowerPoint-style visualisation that helps inform candy selection for Halloween.
Key Steps:
Data Preparation:
- Cleaning and preprocessing the candy dataset (removing unnecessary columns, renaming for clarity).
- Creation of a custom 'popular_vs_cheap' metric based on candy price and win percentage.
- Creation of a 'price_range' column to categorise candies into different price tiers based on the price percentile: - 'Very Low' for the lowest 25% of prices, - 'Low' for the next 25%, - 'Medium' for the middle 25%, - 'High' for the highest 25%.
Dimensionality Reduction:
- Applied Principal Component Analysis (PCA) to reduce the feature space and uncover the most important dimensions in the data (e.g., fruity vs. chocolate, bar vs. bag).
Clustering:
Visualisation:
- A combination of Seaborn and Matplotlib was used to generate scatter plots and heatmaps.
- Jitter was added to scatter plots to avoid overlapping points for better clarity.
- Results were visualised in a fun and engaging manner, including the presentation of findings in a Halloween-themed dashboard for easy interpretation.
Tools and Libraries:
- Pandas: For data manipulation, cleaning, and preparation.
- NumPy: Used for numerical operations and adding jitter to the PCA components.
- Scikit-learn: For applying PCA and K-Means clustering.
- Matplotlib and Seaborn: For data visualisation, including heatmaps and scatter plots.
- PowerPoint-style Visualisation: The final results are presented in a PowerPoint-like format with thematic Halloween elements.
Final Deliverable:
The project concludes with a Halloween-themed visualisation summarising the best candy choices, balancing both popularity and affordability. The visualisation is suitable for decision-making in candy selection, helping keep costs down while still being a hit with the kids!