__STYLES__
Tools used in this project
Trick or treat

About this project

Assignment

Using online votes ranking 85 types of candy, your task is to find the 3 treats you'll give out on Halloween to guarantee that trick-or-treaters of all tastes find something they'll love.

My approach

  • "trick-or-treaters of all tastes" means all mentioned ingredients/flavours should be present in the 3 products and if an ingredient/flavour is present in product 1, it should not be present in product 2. Optionally, take "candy bar" or "hard candy" into account.
  • this should be a "fast" challenge meaning a result in a short time, I used AI or other resources whenever possible.
  • If I can find the combination of products where ingredients/flavours are divided in the correct way over the products, I need the combination with the highest total for the sum of winpercent. This means/suggest that the product combination consists of the products most chosen in the surveys where people had to choose between 2 products.
  • I do not take into regard sugarlevels or pricelevels
  • No statistics

Data

  • I imported the data into a pandas dataframe
  • I removed the products which did not have a 1/true value for one of ingredients/flavours (sorry, no money treats)
  • I removed the columns I did not need.
  • I sorted the resulting dataframe by winpercent descending, which was useless because ChatGPT came up with a brute force script.

Remark: there was a moment I was looking up images for the image. I was not able to find a Payday variant without Caramel. It was supposed to have no caramel. I added "Caramel" to the .csv. It made me wonder about the dataquality in general. All results mentioned below are based on Payday with Caramel and without Chocolat (the special version)

Asking ChatGPT

After giving ChatGPT the column and datatype information of the dataframe (df.info()) and explaining the way to use it, I prompted for the scripts.

Used prompts

  1. What I need is a python script which processes the dataframe from top to bottom (most populair to least, stored in winpercent) and collect the 3 products with the highest total winpercent, ticking all the boxes meaning fruity is 1, and all the other columns are at least 1 in one of the 3 products => result two products could share ingredients/flavours
  2. I like what you did. What if every box may only be ticked once? => box equals binary column in the script below.

I got the following code (#below ChatGPT marks the start of the AI code):

import pandas as pd
from itertools import combinations
candy_in = pd.read_csv("candy-data.csv")
#remove the items with no 1 in some of the taste or structure columns
# 2 times money, 3 times candy scoring < .50 percentile wins
candy_combinations_start = candy_in.copy().drop([2,3,8,48,19],axis=0)
df = candy_combinations_start.drop([ 'sugarpercent','pricepercent','pluribus'], axis = 1)
df.sort_values(by='winpercent', ascending=False,inplace=True)
df.reset_index(drop=True,inplace=True)
#below ChatGPT
# Select the binary columns to be checked (exclude 'competitorname' and 'winpercent'), 'removed hard & bar
binary_columns = ['fruity', 'chocolate', 'caramel', 'peanutyalmondy', 'nougat', 'crispedricewafer']
# Function to check if a combination of products ticks all boxes exactly once
def ticks_each_box_once(df_subset):
    # Sum the values in each binary column for this combination
    column_sums = df_subset[binary_columns].sum()
    # Check if each binary column is ticked exactly once (sum should be 1 for each column)
    return (column_sums == 1).all()
# Variable to store the best combination and its total winpercent
best_combination = None
highest_winpercent = 0
# Iterate over all combinations of 3 products
for combo in combinations(df.index, 3):
    # Get the subset of the DataFrame for this combination
    df_subset = df.loc[list(combo)]
    
    # Check if this combination ticks each box exactly once
    if ticks_each_box_once(df_subset):
        # Calculate the total winpercent
        total_winpercent = df_subset['winpercent'].sum()
        
        # Update if this combination has a higher winpercent
        if total_winpercent > highest_winpercent:
            best_combination = df_subset
            highest_winpercent = total_winpercent
# Output the result
if best_combination is not None:
    print("Best combination of 3 products:")
    print(best_combination[['competitorname', 'winpercent']])
    print(f"Total winpercent: {highest_winpercent}")
else:
    print("No combination of 3 products ticks all the boxes exactly once.")

I roughly checked the code and the results, do I trust ChatGPT blindly? In this one almost.

Results

The lists of 3 products in the results are ordered by winpercent.

#Test all flavours/ingredients with "candybar" and "hard candy" required

  1. Nerds 55.354046
  2. Whoppers 49.524113
  3. Payday 46.296597

Total winpercent: 151.174756

#Test all flavours/ingredients with "candybar" required

  1. Starburst 67.037628
  2. Whoppers 49.524113
  3. Payday 46.296597

Total winpercent: 162.858338

#Test all flavours/ingredients with "hard candy" required

  1. Kit Kat 76.768600
  2. Nerds 55.354046
  3. Payday 46.296597

Total winpercent: 178.419243

#Test all flavours/ingredients, no "hard candy" or "candybar" required

  1. Kit Kat 76.768600
  2. Starburst 67.037628
  3. Payday 46.296597

Total winpercent: 190.102825

Conclusion

Based on the results of the tests, leaving out the optional "hard candy" and "candy bar", I suggest the best choice for the products to buy and make many trick-or-treaters of all tastes find something they'll love is KitKat, Starburst & Payday (without the chocolat!)

Hours used approx. 6, would have been 5 if I had not hit the delete button instead of publish.

Tools and sites used

  • ChatGPT (free version)
  • Freepik.com
  • Google Font: Jolly Lodger
  • Spider (IDE for Python)
  • Affiliate Design
Discussion and feedback(0 comments)
2000 characters remaining
Cookie SettingsWe use cookies to enhance your experience, analyze site traffic and deliver personalized content. Read our Privacy Policy.