__STYLES__
This report provides a comprehensive analysis of customer segments based on Recency, Frequency, and Monetary (RFM) metrics. My analysis, conducted using Power BI and SQL in BigQuery, reveals key insights into customer behavior, spending patterns, and the distribution of revenue across different segments. A total of 4,331 customers were analyzed, with €7.95 million in total sales recorded. Key findings highlight that 67.21% of total revenue is generated by the top customer segment, demonstrating significant concentration among high-value customers. The analysis also reveals areas for improvement, particularly in re-engaging at-risk and lost customers.
To perform this analysis, I utilized the following SQL code in BigQuery to extract and compute Recency, Frequency, and Monetary values, along with customer segmentation based on percentile scoring. Below is the SQL query used:
WITH
-- Compute Recency, Frequency, and Monetary Values
t1 AS (
SELECT
CustomerID,
Country,
MAX(InvoiceDate) AS last_purchase_date,
COUNT(DISTINCT InvoiceNo) AS frequency,
SUM(UnitPrice * Quantity) AS monetary
FROM turing_data_analytics.rfm
WHERE InvoiceDate BETWEEN '2010-12-01' AND '2011-12-01'
GROUP BY CustomerID, Country
),
-- Compute Recency
t2 AS (
SELECT *,
DATE_DIFF(DATE('2011-12-01'), DATE(last_purchase_date), DAY) AS recency
FROM (
SELECT *,
DATE_ADD(DATE(MAX(last_purchase_date) OVER ()), INTERVAL 1 DAY) AS reference_date
FROM t1
)
),
t3 AS (
SELECT
a.*,
-- All percentiles for MONETARY
b.percentiles[OFFSET(20)] AS m20,
b.percentiles[OFFSET(40)] AS m40,
b.percentiles[OFFSET(60)] AS m60,
b.percentiles[OFFSET(80)] AS m80,
b.percentiles[OFFSET(100)] AS m100,
-- All percentiles for FREQUENCY
c.percentiles[OFFSET(20)] AS f20,
c.percentiles[OFFSET(40)] AS f40,
c.percentiles[OFFSET(60)] AS f60,
c.percentiles[OFFSET(80)] AS f80,
c.percentiles[OFFSET(100)] AS f100,
-- All percentiles for RECENCY
d.percentiles[OFFSET(20)] AS r20,
d.percentiles[OFFSET(40)] AS r40,
d.percentiles[OFFSET(60)] AS r60,
d.percentiles[OFFSET(80)] AS r80,
d.percentiles[OFFSET(100)] AS r100
FROM
t2 a,
(SELECT APPROX_QUANTILES(monetary, 100) AS percentiles FROM t2) b,
(SELECT APPROX_QUANTILES(frequency, 100) AS percentiles FROM t2) c,
(SELECT APPROX_QUANTILES(recency, 100) AS percentiles FROM t2) d
),
t4 AS (
SELECT *,
CAST(ROUND((f_score + m_score) / 2, 0) AS INT64) AS fm_score
FROM (
SELECT *,
CASE
WHEN monetary <= m20 THEN 1
WHEN monetary <= m40 AND monetary > m20 THEN 2
WHEN monetary <= m60 AND monetary > m40 THEN 3
WHEN monetary <= m80 AND monetary > m60 THEN 4
WHEN monetary <= m100 AND monetary > m80 THEN 5
END AS m_score,
CASE
WHEN frequency <= f20 THEN 1
WHEN frequency <= f40 AND frequency > f20 THEN 2
WHEN frequency <= f60 AND frequency > f40 THEN 3
WHEN frequency <= f80 AND frequency > f60 THEN 4
WHEN frequency <= f100 AND frequency > f80 THEN 5
END AS f_score,
-- Recency scoring is reversed
CASE
WHEN recency <= r20 THEN 5
WHEN recency <= r40 AND recency > r20 THEN 4
WHEN recency <= r60 AND recency > r40 THEN 3
WHEN recency <= r80 AND recency > r60 THEN 2
WHEN recency <= r100 AND recency > r80 THEN 1
END AS r_score
FROM t3
)
),
t5 AS (
SELECT
CustomerID,
Country,
recency,
frequency,
monetary,
r_score,
f_score,
m_score,
fm_score,
CASE
WHEN (r_score = 5 AND fm_score = 5)
OR (r_score = 5 AND fm_score = 4)
OR (r_score = 4 AND fm_score = 5)
THEN 'Champions'
WHEN (r_score = 5 AND fm_score = 3)
OR (r_score = 4 AND fm_score = 4)
OR (r_score = 3 AND fm_score = 5)
OR (r_score = 3 AND fm_score = 4)
THEN 'Loyal Customers'
WHEN (r_score = 5 AND fm_score = 2)
OR (r_score = 4 AND fm_score = 2)
OR (r_score = 3 AND fm_score = 3)
OR (r_score = 4 AND fm_score = 3)
THEN 'Potential Loyalists'
WHEN r_score = 5 AND fm_score = 1 THEN 'Recent Customers'
WHEN (r_score = 4 AND fm_score = 1)
OR (r_score = 3 AND fm_score = 1)
THEN 'Promising'
WHEN (r_score = 3 AND fm_score = 2)
OR (r_score = 2 AND fm_score = 3)
OR (r_score = 2 AND fm_score = 2)
THEN 'Customers Needing Attention'
WHEN r_score = 2 AND fm_score = 1 THEN 'About to Sleep'
WHEN (r_score = 2 AND fm_score = 5)
OR (r_score = 2 AND fm_score = 4)
OR (r_score = 1 AND fm_score = 3)
THEN 'At Risk'
WHEN (r_score = 1 AND fm_score = 5)
OR (r_score = 1 AND fm_score = 4)
THEN 'Cant Lose Them'
WHEN r_score = 1 AND fm_score = 2 THEN 'Hibernating'
WHEN r_score = 1 AND fm_score = 1 THEN 'Lost'
END AS rfm_segment
FROM t4
)
SELECT * FROM t5;
This query enabled the segmentation of customers into various RFM groups, allowing me to derive insights on their purchasing behavior and tailor marketing strategies accordingly.
The analysis reveals significant variation in customer behavior and spending across different regions. Here are the key takeaways:
Countries with High Recency (e.g., Lebanon, Lithuania)
A critical finding from the RFM analysis is that 67.21% of total revenue is generated by the top customer segment. While this highlights the importance of high-value customers, it also poses a significant risk. A downturn in engagement from this segment could lead to a sharp revenue decline. As such, diversifying the customer base is essential. By nurturing segments such as Potential Loyalists and Promising, the business can reduce its reliance on top customers and build a more stable revenue stream.
RFM
The RFM analysis, combined with the SQL extraction, provides critical insights into customer behavior, highlighting both opportunities and risks. By focusing on retention, re-engagement, and market expansion strategies, the business can continue to grow its revenue base while minimizing potential risks related to revenue concentration. Future efforts should prioritize building stronger relationships across all customer segments, with tailored approaches for each group.