__STYLES__

Unlocking E-commerce Success with SQL & Power BI

Tools used in this project
Unlocking E-commerce Success with SQL & Power BI

Power BI

About this project

It’s been more than a year since I completed my first project. During this time, I've delved deeper into ETL processes, data analysis, and visualization techniques. I now possess a better understanding of what "Data-driven storytelling" entails and how it should be implemented. Whether this newfound knowledge came from work experience, courses, other projects I've made, or a combination of these factors, I now feel more confident in my communication skills and critical thinking abilities than I did a year ago.

For these reasons, and as a means to evaluate my own progress, I decided to revisit my initial project and create a revised version using SQL and Power BI instead of SQL and Excel (though I still consider Excel an excellent tool). My goal was to create a reworked version that maintained a semblance to the classic Excel style to a certain extent.

Introduction

Maven fuzzy Factory is an E-commerce retailer that has been in the market for three years (2012–2014) and it’s ready to become a major player in the E-commerce world. To achieve this goal, the CEO has decided to secure a new round of funding necessary for further expansions and she entrusted the data analyst with the responsibility of creating a compelling story that convince stakeholders to invest in an ever-growing company.

With the previous objective in mind, this report meant to explain how the company was able to produce growth by optimizing key performance indicators, including conversion rates from sessions to orders and their respective increase over time, channel and product diversification, click-through rates, revenue, profit, margin, cross-selling and funnel optimization.

Upon completing this explanatory analysis, stakeholders will gain a comprehensive understanding of a successful startup primed for its next phase of growth and consolidation in the E-commerce market.

Project Structure

1- Explanatory analysis

2- Recommendation to Stakeholders

3- SQL Queries

4- DAX Code

5- Final Words

Explanatory Analysis

Terminology

Sessions are generated each time a customer accesses the company’s website. When a customer purchases a product, an order is created.

Customers can access the website through search engines or by directly typing the website URL. Some customers manage to land on the website thanks to marketing ads with a pay-per-click model, meaning that for each click, the company has to pay a certain amount of money to publishers. Other customers access by typing the name of the company into the search engine; this is called organic search.

Organic searches and direct URL visits are the most valuable ways to obtain customers, as these methods do not incur any implicit costs to the company.

Conversion Rate over total implies how many of the total sessions managed to reach the following pages.

Click-through rate determines the percentage of sessions that reached a certain page and subsequently clicked through to the next page in the funnel.

The dataset spans the period from 2012 to 2014.

Variable Behavior and insights

More than 50% of total orders generated for the company were made in the last year, and that percentage was achieved without reducing profit margins through price cuts. In fact, as elaborated further in this section, revenue per session increased due to cross-selling and conversion rate optimization.

undefined

Initially, the company attracted customers exclusively through search engine optimization and a pay-per-click (PPC) advertising strategy. However, by the first quarter of 2015, orders generated through organic search and direct traffic combined accounted for 22% of the total orders, while orders originating from the pay-per-click strategy decreased to 55%. This not only indicated an improvement in customer loyalty but also led to a reduction in spending on pay-per-click advertising, which had previously been the primary source of customer acquisition.

undefined

Seasonal peaks occur during the second and fourth quarters. However, for customers accessing the website through direct and organic search, seasonality extends until the end of the first quarter and effectively connects with the peak in the second quarter. This suggests that customers who are familiar with the website and access it without the need for advertisements are more likely to purchase products outside of traditional retailer peaks.

undefined

Conversion rates of sessions to orders increased from the initial 3.19% to 7.74%, significantly surpassing the U.S. Average of 2.3% by more than three time.

undefined

The Compound Quarterly Growth Rate (CQGR) is 32.77%, and the Compound Annual Growth Rate (CAGR) is 210.77%, an impressive feat considering that the average U.S. growth between 2012 and 2014 never surpassed 100% on average. (https://www.statista.com/statistics/693669/rate-of-start-up-growth-us)

As expected, there is a clear correlation between revenue per session and conversion rates, since the latter cause more orders. However, an increase in the number of transactions is not the only factor at play; revenue per session can also be enhanced through an increase in revenue per transaction, achieved by progressively adding new products to the website and optimizing cross-selling between them.

undefined

The conversion rate increased from 4.1% in 2012 to 7.2% in 2014, primarily due to extensive funnel optimization efforts. It's crucial for a company to not only assess results but also analyze the processes and strategies that contributed to these results. Implementing solutions should consider both process improvements and their impact on the overall structure.

undefinedundefined

Recommendation to Stakeholders

The comparison of revenue growth between 2014 and 2013 reveals remarkable results. The company achieved revenue growth exceeding 100%, and a staggering 200% compared to 2012, far surpassing the market's average of 60% for third-year startups. Additionally, the conversion rate, at 7.48%, is not only impressive but also more than three times higher than the market's average. This underscores the company’s exceptional ability to attract and retain customers effectively.

Furthermore, our sessions data clearly indicates a consistent upward trend, particularly in direct and organic landings, signifying a growing interest and trust in our brand. Click-through rates demonstrate a well-optimized user experience, while the company's active diversification of product offerings showcases a proactive approach. This not only reduces risks associated with dependency on a single product but also positions the company for continued growth and market leadership.

In conclusion, the robust revenue growth, exceptional conversion rate, increasing direct and organic sessions, optimized click-through rates, and commitment to product diversification collectively make a compelling case for investment. The company exhibits a solid foundation and a clear path to continued success in the market.

SQL Queries

EER Diagram

undefined

Project Approach

I made a deliberate decision to leverage SQL for the majority of necessary aggregations, reserving Power Query and DAX for minor adjustments and complex calculations. This approach was driven primarily by the substantial size of certain tables, such as website_sessions and website_pageviews, which contain 500 thousand and 1 million rows, respectively. Sending the entire model to Power BI would not be an optimal choice given these considerations.

The resulting SQL queries encapsulate all the essential information needed to achieve the objectives outlined for this dashboard. It's worth noting that temporary tables were utilized in this fictional project. However, in real-world scenarios, temporary tables would typically be replaced by Common Table Expressions (CTEs) for improved readability and maintainability.

Some Notes About the Code Used

MySQL does not use start or end of the quarter, instead it uses absolute number (1,2,3,4) to define the quarters, this behavior will cause some inconvenience in Power BI, the best approach here is to create a date column with start or end of the quarter.

undefinedCASE statement is similar to the IF function in other languages, WHEN some conditions are met THEN it returns a value. There is also a COUNT surrounding the CASE statement and is there to perform the aggregation.

undefinedThe use of COUNT DISTINCT (count all the values as 1, even if there was more than one. I’m just clarifying that a distinct count is not the same as unique which means: count only the values that are not repeated).

undefinedThe Queries

undefinedundefinedundefinedundefinedundefinedundefinedundefinedundefinedundefinedundefinedundefinedNOTE: the queries for 2012 and 2014 funnels are almost the same, just change the date range in the WHERE clause.

undefinedundefinedundefined

DAX Code

This section highlights some of the most pertinent DAX queries utilized in the project, along with encountered issues and their respective solutions. While it does not offer an exhaustive overview of every query, it focuses on three calculations deemed the trickiest, while others posed no significant complications.

The Geometric mean problem

DAX inherits its Geometric Mean function from Excel, which unfortunately inherits the same issue: it incorporates the first period into the calculation. Given that growth inherently requires comparison between two or more periods, the initial period lacks a point of comparison, resulting in a growth value of 0. In Excel, one can easily circumvent this issue by skipping the first row. However, in DAX, which operates by columns, the solution is somewhat more complex. I decided to skip the first period of growth (second row), as the startup experienced a 478% growth in the second quarter compared to the initial period. Such extraordinary growth does not recur in subsequent periods, making its exclusion preferable.

undefinedundefinedundefinedundefined

Coefficient of determinations (R²) between Conversion Rates and Revenue per Session

DAX provides a function named LINEST and its iterator version, LINESTX, which generate a table with values necessary for calculating a line that best fits the data. Once the virtual table is created, I took R² from one of the columns.

undefinedundefined

Revenue per Session Increased and Conversion Rate Increased

I tried to use the INDEX function to obtain the first and last value, but because INDEX uses the native column order, it keep returning the minimum and maximum values within the column instead of the first and last. Consequently, I shifted to the old TOPN tactic. CVR % Increased, Revenue per session, and % Revenue per session utilized similar measures as the one below.

undefined

Final Words

If you reached this far, thank you, for taking the time, I hope you enjoyed the project and maybe managed to learned perhaps even gained some new insights. For any analyst, the first project can be dauting, but also immensely rewarding. Upon reaching the final version, one realizes that all the effort put into learning the theorical and technical concepts necessary to excel in the world of analytics are finally materializing.

But as I mentioned at the start of the project, after more than a year and with new knowledge acquired, I started to see some optimization flaws in the code and also realized that the use of multiple pivot tables was not the best approach and they had to be improved.

Even though I migrated the project from Excel to Power BI and changed a couple of SQL queries, the essence of my first project remains.

If you have any question or recommendation to make about this project, please feel free to contact me.

Discussion and feedback(0 comments)
2000 characters remaining
Cookie SettingsWe use cookies to enhance your experience, analyze site traffic and deliver personalized content. Read our Privacy Policy.