__STYLES__

Penny-Wise Treats: The Halloween Candy Quest

Tools used in this project
Penny-Wise Treats: The Halloween Candy Quest

Jupyter Notebook Script

About this project

Project Overview:

This project is focused on identifying the best Halloween candies in terms of both popularity and affordability. Using data science techniques such as Principal Component Analysis (PCA) and K-Means clustering, I explore the various candy attributes (e.g., chocolate, fruity, nutty, bar form) to classify candies into meaningful groups. Additionally, a custom 'popular_vs_cheap' score is calculated to highlight the candies that are not only well-liked but also cost-effective. The final output is presented in a PowerPoint-style visualisation that helps inform candy selection for Halloween.

Key Steps:

  1. Data Preparation:

    • Cleaning and preprocessing the candy dataset (removing unnecessary columns, renaming for clarity).
    • Creation of a custom 'popular_vs_cheap' metric based on candy price and win percentage.
    • Creation of a 'price_range' column to categorise candies into different price tiers based on the price percentile: - 'Very Low' for the lowest 25% of prices, - 'Low' for the next 25%, - 'Medium' for the middle 25%, - 'High' for the highest 25%.
  2. Dimensionality Reduction:

    • Applied Principal Component Analysis (PCA) to reduce the feature space and uncover the most important dimensions in the data (e.g., fruity vs. chocolate, bar vs. bag).
  3. Clustering:

    • K-Means clustering was used to group candies into three clusters:

      1. Chocolate Nutty Sweets
      2. Fruity Bags
      3. Chocolate Flavoured Bars
  4. Visualisation:

    • A combination of Seaborn and Matplotlib was used to generate scatter plots and heatmaps.
    • Jitter was added to scatter plots to avoid overlapping points for better clarity.
    • Results were visualised in a fun and engaging manner, including the presentation of findings in a Halloween-themed dashboard for easy interpretation.

Tools and Libraries:

  • Pandas: For data manipulation, cleaning, and preparation.
  • NumPy: Used for numerical operations and adding jitter to the PCA components.
  • Scikit-learn: For applying PCA and K-Means clustering.
  • Matplotlib and Seaborn: For data visualisation, including heatmaps and scatter plots.
  • PowerPoint-style Visualisation: The final results are presented in a PowerPoint-like format with thematic Halloween elements.

Final Deliverable:

The project concludes with a Halloween-themed visualisation summarising the best candy choices, balancing both popularity and affordability. The visualisation is suitable for decision-making in candy selection, helping keep costs down while still being a hit with the kids!

Discussion and feedback(2 comments)
comment-1942-avatar
Jade Handy
Jade Handy
about 1 month ago
Nice job Ewa! I see you went the hard way and used K-Means clustering!

comment-1964-avatar
Gosse Adema
Gosse Adema
about 1 month ago
I like your approach. And thanks for sharing the Jupyter notebook.
2000 characters remaining
Cookie SettingsWe use cookies to enhance your experience, analyze site traffic and deliver personalized content. Read our Privacy Policy.