__STYLES__
Understanding customers' desires is one of the most, if not the most important aspect of any business. With this knowledge, businesses can adjust their products or services to best fit market requirements. It allows companies to focus attention on characteristics that add value and, at the same time, avoid those perceived negatively or indifferently by the market. One of the means companies have to acquire this knowledge is through “Customer Behavior Analysis.” Customer patterns detected during their interaction with the company’s products or services could shed some light on what customers value the most, what attributes are irrelevant, and how close the company’s offer is to that perfect product.
The project Initiates with an in-depth exploration of customer lifetime value (CLV) dispersion. Utilize a series of box plots, each tailored to a specific customer attribute, to visualize CLV distribution.
Empower users with the ability to select percentile values from a table showcasing the top-bottom 75th percentile. This percentile is derived from an analysis of all available attributes within the dataset. Upon selection, accompanying cards will dynamically display corresponding attributes.
Transition to the second page, offering an overview of flights and distance traveled by customers.
Conduct a Year-over-Year comparison for both metrics, providing insights into trends and patterns over time.
Present forecasts for each metric, offering predictive insights into future performance.
Dedicate the third page to examining key metrics: enrollments and cancellations.
Analyze the evolving relationship between these metrics, including interval trends, Month-Over-Month growth, and the relative percentage of cancellations to enrollments.
Conclude with an in-depth examination of the airline's loyalty program.
Explore points accumulated and redeemed by customers, shedding light on the program's effectiveness and its financial impact on the company.
Skewed Distribution: The distance between the median and maximum values highlights a significant skew in the distribution. The presence of high-value customers pulls the mean away from the median, indicating a reliance on a relatively small number of customers for a significant portion of the total value.
Dependency on High-Value Customers: Notably, 52% of the total value is generated by customers above the 75th percentile. This emphasizes the importance of high-value customers in contributing to the overall revenue and underscores the need to nurture and retain these valuable segments.
Uniform Dispersion Across Attributes: Despite minor variations in attributes such as education level, marital status, and province, the distribution of Customer Lifetime Value remains relatively uniform across all categories. While there may be differences in maximum values for specific attributes, the median values consistently align with the general distribution.
Top 75th Percentile: As expected, the attributes combinations that yield elevated percentiles have a shorter gap between max and median values.
There’s an option in the top right corner of the page that allow users to choose between filter extreme values away from the report (values above three standard deviations of the mean), filter normal values and keeping the extreme, or show both (standard page view, the report will start this way).
Even without extreme values, customers above the 75th percentile are still responsible for almost 50% of CLV.
The patterns observed in metrics for flights booked and distance travelled exhibit striking similarities, with notable seasonal peaks occurring in March, July, and December. These peaks coincide with periods of increased travel demand, likely influenced by factors such as holidays or seasonal events. Additionally, both metrics demonstrate a consistent upward trend, typically ranging between 12% to 14%, indicating a steady growth trajectory over time.
Given that distance travelled is inherently linked to flights booked, variations in flight bookings directly impact distance travelled. This interconnected relationship underscores the importance of understanding flight patterns in predicting and analyzing distance travelled metrics.
It's worth highlighting a significant change in interannual increases observed between April and May 2018, marking a pivotal point where the year-over-year gap became notably relevant, warranting further investigation to discern underlying causes.
To provide insights into future trends, a forecast was generated using month-over-month growth data from 2017 to 2018, accounting for seasonal variations. By leveraging historical patterns and incorporating seasonality factors, the forecast aims to provide informed estimations for future metrics, assisting in strategic planning and decision-making processes.
A critical aspect to highlight in this analysis is the concerning trend of escalating cancellations outpacing enrollments, coupled with a notable reduction in customer retention periods. This alarming pattern underscores the importance of identifying factors contributing to increased cancellations and implementing strategies to mitigate churn.
Of particular interest is the observed spike in enrollments during April 2018, followed by a corresponding surge in cancellations four months later.
Variables follow a similar pattern as flights and distance, March, June, July, August and December have higher levels of accumulation and spending of points.
The comparison between points redeemed as a percentage of point accumulated, reveals a relatively stable pattern with minimal variation over time.
Another noteworthy observation is the slight decrease in the ratio of customers with points accumulated to total customers over the duration of the dataset. This decline, from 64% in January 2017 to 58% in December 2018.
The report also provides a parameter to adjust the percentage of points redeemed by customers, allowing for a customizable analysis of the potential impact on company costs. Higher redemption percentages imply greater financial outlay for the company, highlighting the importance of understanding and managing customer redemption behavior.
Additionally, the "Similar Metrics" section of the report explores correlations between various metrics analyzed in this study, providing insights into potential relationships and interdependencies within the dataset.
Given that customers above the 75th percentile contribute over 50% of the revenue, targeting and expanding this high-value segment through a tailored marketing strategy could yield significant returns. However, the uniformity of customer attributes across all segments suggests that additional metrics not currently included in the dataset may be influencing customer behavior. Exploring factors such as "Profession Type" and "Children" could provide deeper insights into customer preferences and behaviors. The profession attribute can tell us if any particular customer is flying for work or pleasure, and also what types of professions are more likely to travel than others.
While focusing on high-value segments is a valid strategy, it also carries inherent risks due to the company's dependence on a small number of customers. To mitigate this risk, it is advisable to allocate attention to segments with higher 75th percentiles, which exhibit a more diversified Customer Lifetime Value (CLV) distribution. Even in higher percentiles segments, CLV above p75 is more than 50% percent, but median and max distance is usually less than two times the median value. To simplify: purchasing power of those segments are higher, so losing a client above p75 will be less dangerous. Striking the right balance between CLV and segment size can lead to a less risky investment approach.
Identifying the root cause of the gap between April and May 2018 is imperative. Any business decision made during that period resulted in a significant increase in both flights booked and distance travelled. Moreover, this increase was sustained throughout the entire data sample, indicating a permanent shift rather than a temporary fluctuation attributed to seasonality.
Apparently, the same decision that causes and increased in the number of flights in April 2018, also produced 80% increase in enrollments.
A subsequent spike in cancellations, occurring one month after the seasonal peak in flights, raises questions about customer retention strategies. It is possible that some customers enrolled for specific promotions but later opted to cancel their memberships.
The decline in customers with accumulated points may indicate waning interest in the loyalty program. This assumption is supported by the consistent percentage of customers redeeming points throughout the dataset. While it is plausible that an increase in point redemption could reduce the number of accumulated points, this scenario did not materialize, as the redemption rate remained stable at approximately 12%.
Since forecast information was not present in the initial data set, I decided to create a support table. Leveraging the growth patterns observed in previous years, I generated forecasts for both flights booked and distance travelled. This involved creating a virtual table with three columns: months and years, forecast for flights, and forecast for distances. Filtering out data before 2018, I then added a new date column containing month & year for 2019. Tables are connected through a one-column table with a set of dates from 2017 to 2019.
Both forecasts (distance and flights) were made using almost identical DAX Patterns. In the following code you will see the measures I used for Flights. The only difference with distance measures is ‘Customer Flight Activity’ [Distance] instead of is ‘Customer Flight Activity’ [Flights Booked]
The "Customer Loyalty History" table contains both enrollment and cancellation metrics, each utilizing date columns. This practice, however, is not recommended as it hinders direct comparisons between them. Ideally, separate tables should be created for cancellations (similar to handling product sales and returns) if the data is updated daily. This can be achieved using SQL, Power Query, or Excel. Given that this analysis is a one-time assessment over fictional data, and I was interested in experimenting with various DAX techniques, I adopted an alternative approach.
I created a support table comprising only one column with 84 rows containing months and years. This table serves as the basis for evaluating both enrollments and cancellations. While enrollments can be directly evaluated using this support table, cancellations require the use of the RELATED function to temporarily activate the relationship between tables, using the inactive relationship.
Of course, not all customers cancelled their subscriptions, so to calculated average years between enrollments and cancellations, first I had to filter the table to keep only churn customers and then difference can be calculated.
Measures prior and after 2016 have similar patterns, the only difference is a variable with a filter.
Power BI does not have native box plot, and is something I miss from Excel. Fortunately, you can make some tweaks to the “Line and Stacked column chart” to create the box Plots. If you want to create your own, please follow the link below, Bas from “How to Power Bi” Channel makes a good explanation on how to create a Box Plot. I used his guidelines to create my own box plots.
https://www.youtube.com/watch?v=u6JVQKj5ijU
The Entire Model
I created this project as a way to test and improve my data visualization, DAX coding, and business analysis capabilities. I tried to apply everything I learned while working as a business analyst, the technical and soft skills from the data analysis courses I took, as well as what my teachers taught me while studying business in college. Due to the fact that the data used in the project is fictional, there were some limitations regarding the depth of the analysis. However, even with those limitations, I am content with the final result and with what I learned while creating this project. If you have reached this point, I can only say thank you for taking the time to read the entire project, and I hope you found it useful. If you have any questions or recommendations about this project, please feel free to contact me at: