__STYLES__
The Instacart Market Basket Analysis project aims to explore and analyze customer shopping behavior using transactional data from the Instacart platform. This dataset provides a rich source of information that can be used to derive valuable insights for optimizing operations and enhancing customer experiences.
Problems to solve.
· How can we increase customer retention?
· How can we improve customer segmentation and targeting?
· How can we reduce customer churn?
· How can we identify and promote product associations?
In this analysis, I explored the grocery store data in-depth to derive meaningful insights. Below, I present the findings and insights from our data analysis.
The database contains six tables:
● aisles.csv — Contains information about different product categories (aisles).
● departments.csv — Provides details about various departments within the store.
● order_products__prior.csv — Includes information about products included in prior customer orders.
● order_products__train.csv — Contains details about products in the training set of customer orders.
● orders.csv — Provides information about individual orders and customers.
● products.csv — Contains details about products, including aisle and department IDs.
Refer to the provided data dictionary for a detailed description of each dataset and its columns.
The database contains the following approximate quantities of data:
PRODUCT: 49,755 rows
ORDERS: 1,048,575 rows
DEPARTMENTS: 21 rows
AISLES: 134 rows
ORDER_PRODUCT_PRIOR: 32,434,489 rows
ORDER_PRODUCT_TRAIN: 1,384,617 rows
Missing Values
I began by checking for missing values in the ORDERS table, specifically in the ‘days_since_prior_order’ column. I’m pleased to report that all values were found in this column.
Data Integrity
To ensure data integrity, I undertook the task of removing duplicate rows from the PRODUCT table. I identified duplicates based on product names and retained only one entry with the minimum ‘product_id.’ This data cleaning process ensures the data is accessible from redundancy and maintains consistency.
I conducted a market basket analysis to uncover the top 10 product pairs customers most frequently purchase together. This information is vital for optimizing inventory management and fine-tuning our marketing strategies. The analysis involved joining the order_product_prior data to identify product pairs and their frequencies. To make the insights more actionable, I included product names.
Steps:
Step 1: Join the Orders,
Order_products__prior,
and Products
tables to create a dataset that includes order information and product names.
Step 2: Group the data by order_id and create a list of products in each order.
Step 3: Use these lists to calculate the frequency of product pairs in orders.
Step 4: Rank and select the top 10 product pairs with the highest frequency.
The top 10 product pairs that are most frequently purchased together are:
I also explored which products are most commonly added to the cart first. This insight is crucial for understanding customer preferences and planning promotional activities.
Steps:
Step 1: Analyze the Order_products_prior
table to find the products added to the cart first.
Step 2: Count the occurrences of each product and rank them to find the top 5.
The top 5 products that are most commonly added to the cart first are:
How many unique products are typically included in a single order?
Another aspect of our analysis involved examining the number of unique products typically included in a single order. This insight provides a deeper understanding of customer preferences and choices.
Order_products__prior
table.The average number of unique products included in a single order is 7.
I also conducted customer segmentation based on purchase frequency, classifying them as ‘Frequent Buyers,’ ‘Regular Buyers,’ or ‘Occasional Buyers.’ This segmentation can guide targeted marketing efforts.
Step 1: Join the Orders
and Order_products__prior
tables to calculate the total amount spent by each customer.
Step 2: Segment customers based on their total spending.
Based on the total amount they’ve spent on orders; customers can be categorized as follows:
To gain insights into customer spending behavior, I categorized customers into three groups: ‘High Spenders,’ ‘Medium Spenders,’ and ‘Low Spenders’ based on their total order count.
Step 1: Analyze the Orders
table to calculate the purchase frequency of each customer.
Step 2: Segment customers based on their purchase frequency.
How many orders have been placed by each customer?
Count the number of orders placed by each customer in the Orders
table.
The average number of orders per customer is 16.
I analyzed the distribution of orders placed on different days of the week to identify trends in customer behavior.
Orders
table to calculate the distribution of orders by day of the week.The distribution of orders placed on different days of the week is as follows:
Inactive Customers
One of the critical metrics in our analysis was calculating the number of customers who have yet to place an order in the last 30 days. This information is invaluable for identifying inactive customers and potentially re-engaging them.
Orders
table to identify customers who haven’t placed an order in the last 30 days.The number of customers who last placed an order in the last 30 days is 46,392.
To gauge customer churn, I calculated the customer churn rate. This metric represents the percentage of inactive customers relative to the total customer base.
The customer churn rate is 74%.
Can we find products that are often bought together on weekends vs. weekdays?
I compared product associations on weekends and weekdays to determine if buying patterns varied between these two time periods.
Step 1: Analyze the Orders
table to identify weekends and weekdays.
Step 2: Apply association rule mining separately for weekends and weekdays to find product combinations.
The following product combinations are often bought together on weekends vs. weekdays:
Aisle and Department Analysis
I analyzed the top-performing categories to gain insights into product sales by department and aisle. This information is crucial for inventory management and marketing decisions.
The top 5 departments by sales are:
The top 5 aisles by sales are:
To increase customer retention, we can:
To improve customer segmentation and targeting, we can:
To reduce customer churn, we can:
Cross-Promotion of Frequently Purchased Products: Utilize the insights from frequently purchased product pairs to strategically place complementary items together on store shelves or in online product recommendations. This will boost sales by encouraging customers to buy related products.
Personalized Marketing for Customer Segments: Leverage customer segmentation based on spending behavior and purchase frequency to tailor marketing campaigns. High spenders might benefit from loyalty programs, while occasional shoppers could be enticed with promotions to increase their engagement.
Seasonal Campaigns: Capitalize on seasonal trends by launching targeted marketing campaigns during peak months. For example, the holiday season in December can be optimized with exclusive offers and promotions.
Churn Prevention Strategies: Identify and engage with customers who last purchased in the last 30 days. Send personalized offers or reminders to re-engage them with your products and services.
Product Bundling and Special Offers: Based on product association rules, consider creating bundles or special offers for product combinations that are frequently purchased together. This can increase the average order value.
Data-Driven Inventory Management: Use insights on product popularity to optimize inventory levels. Ensure that high-demand items are well-stocked while low-demand items are managed more efficiently.
By implementing these data-driven suggestions, we can enhance our store layout, marketing strategies, and customer engagement efforts. This approach will not only improve customer satisfaction but also contribute to increased sales and profitability, setting our business on a path to long-term success.
In conclusion, this data analysis gives us the insights to make informed decisions and enhance the grocery store’s overall performance.