E-commerce Treasure Hunt: A Journey from Cleanup to Cohorts with SQL

Tools used in this project
E-commerce Treasure Hunt: A Journey from Cleanup to Cohorts with SQL

Tableau Dashboard

About this project


In the fast-changing world of online shopping, I embarked on an exciting journey to explore the vast e-commerce dataset and unlock interesting insights with SQL as my best friend to work with, along with Tableau for visualization.

This project was about helping e-commerce thrive in the competitive world. It provides a deep insight into customer behaviour and loyalty via cohort analysis, and I was also able to find key trends that help in marketing and profit maximization.

About the dataset:

The dataset contains the customer IDs, transaction and product IDs, the product lines, the transactions made and the date, the order status, available brands, the list price and the standard cost along with the first date of the product sold. There were 13 fields and 20000 records.

Tools Used:

SQL (Data Cleaning and Analysis) - Code here!

Tableau (Data Visualization)

Steps followed:

The overall exploration was divided into 3 key phases.

Phase 1: Data Cleaning

I cleaned up the data to fix errors and to ensure it was ready for the next step of analysis.

Phase 2: Cohort Analysis

This was performed to study a group of customers based on their shopping habits and timelines. This revealed amazing insights about customer loyalty and their behaviour, like deciphering a hidden code.

Phase 3: Insights

I used SQL to find the answers to some important questions, which will provide the key trends and help in decision-making on various factors.

Key Data Insights:

  1. There are 3494 customers.
  2. There are 19821 approved and 179 unapproved transactions.
  3. 69 customers have made at least one transaction per product.
  4. The product line with the highest average list price is “Touring”.
  5. The Brand with the highest profit margin is “Trek Bicycles”.
  6. The total sales of each brand are identified with “Solex” topping the list followed by “WeareA2B”.
  7. The top 5 products with the highest profit margin are identified with product ID 3 and 38 topping the list.
  8. The top product lines with the highest total revenue and their percentage contribution to the overall revenue is identified with “Standard” topping the charts with a whopping difference.
  9. The top 10 customers with highest number of transactions are identified.
  10. The top 10 customers with their average profit per transaction are identified.

Actionable Insights:

  • Identifying the product lines, products and the brand with highest profit margin, the brand with highest total sales, the product line with the highest revenue contribution rate was essential to identify the categories that generate high revenue for the business and hence marketing strategies around this area can be done to improve the business and helps maximize profitability and succeed.
  • Identifying the top 10 customers with highest number of transactions, the total amount they spent shopping and finding customers who made at least 1 transaction per product line helps us to identify customer behaviour, their engagement, buying behaviour and preferences across various product ranges. This might help in future stockings as well and promotions to retain the customers.
  • The Cohort analysis was done to see how each cohort’s size changes over time relative to its initial size. This helps in assessing the effectiveness of the strategies in retaining and engaging customers. It helps improve customer loyalty and satisfaction.

Data Visualization:

A dashboard was created in Tableau to showcase the key trends.


This project was a learning experience and helped me with building up my skills to work with SQL to gain insights and Tableau to visualize. It also helped me learn to analyze the e-commerce data effectively and to perform meaningful cohort analysis.

Discussion and feedback(0 comments)
2000 characters remaining