__STYLES__
I delved into these questions for my LEGO project:
What were the 10 most common LEGO themes and subthemes for each decade? What was the most common LEGO theme (and subtheme) of all time?
Were the number of Harry Potter LEGO sets released per year correlated with Harry Potter movie release dates?
Were the number of Star Wars LEGO sets released per year correlated with Star Wars movie release dates?
Were the number of Bionicles LEGO sets released per year correlated with Bionicles movie release dates?
I hypothesized that LEGO was more likely to release more Harry Potter-themed/Star Wars-themed sets right before and/or right after a movie release, so sales could be boosted by the movie's popularity and hype.
I imported the Excel data file into Power BI and performed data manipulation in Power Query Editor before answering these questions.
First, I needed to create a new column so that I could sort data by decades (1970s, 1980s, 1990s, and so on), instead of just years. The dataset provided by Maven Analytics only provided the individual years (1971, 1972, etc.).
To enable filtering by the decade in Power BI, I created a new calculated column in my dataset that grouped the years into decades.
This was the DAX formula I used to calculate the decade:
Decade = FLOOR('lego_sets'[Year], 10) & "s"
This formula takes the 'Year' column, rounds it down to the nearest multiple of 10 (to get the decade start), and adds an "s" to make it a string, like "1970s".
'FLOOR' is a function in DAX that rounds a number down to the nearest specified multiple. For instance, if the year is 1975, 'FLOOR(1975, 10)' will round it down to 1970. This DAX formula specifies 10 as the significance so the rounding down happens to the nearest multiple of 10, grouping the years into decades.
Next, I created a DAX measure to find the top 10 themes. I did not want to display all of the themes in my visualization since that would be much more than my dashboard could accommodate (there were 153 themes total!)
To do this, I wrote this DAX formula that ranks themes by the count of 'set_id' and filters to the top 10:
Top 10 Themes =
VAR RankBySetID = RANKX(
ALL('lego_sets'[Theme]),
CALCULATE(COUNT('lego_sets'[set_id])),
, DESC,
Dense)
RETURN
IF(RankBySetID <= 10, COUNT('lego_sets'[set_id]))
Here is a brief explanation of this DAX formula:
"RANKX" ranks the rows in the 'lego_sets' table.
'ALL('lego_sets'[Theme])': The ALL function removes all filters from the 'Theme' column in the 'lego_sets' table. This means the ranking is done across ALL themes, not just the ones currently visible.
The 'CALCULATE' function says to count the number of set IDs for each theme and use that as the context in which the data is evaluated.
'DESC' says that the ranking should be in descending order (larger counts get a lower rank number)
'Dense' is a mode of ranking. If there are ties - meaning themes with the same count - they will receive the same rank, and the next rank will not be skipped. For example, if two themes are ranked 2nd, the next theme will be ranked 3rd (not 4th). This means that even if the chart is supposed to show the "Top 10 Themes", there may be 11 or more themes listed (if there's a tie).
'RETURN' specifies the formula output.
'IF(RankBySETID <=10, COUNT('lego_sets''[set_id]))': checks if the rank of the current theme is 10 or less. If the rank of the current theme IS 10 or less, it returns the count of set_ids for that theme. If not, it returns nothing (blank).
Similarly to the previous step, I created a DAX measure to calculate the top 10 subthemes as well.
To do this, I wrote a similar DAX formula that ranks subthemes by the count of 'set_id' and filters to the top 10:
Top 10 Subthemes =
VAR RankBySetID = RANKX(
ALL('lego_sets'[Subtheme]),
CALCULATE(COUNT('lego_sets'[set_id])),
,
DESC,
Dense
)
RETURN
IF(RankBySetID <= 10, COUNT('lego_sets'[set_id]), BLANK())
The main difference between this and the previous DAX measure is that this calculates the top 10 subthemes, not themes. Also, I included BLANK() for ranks above 10, explicitly stating that ranks above 10 should return no value.
Next, I added the created measures (Top 10 Themes and Top 10 Subthemes) to my Power BI visual and tested them to make sure they worked correctly.
As you can see, I added '"Top 10 Subthemes": is not blank' as a filter on my visual. Now the visual only shows the top 10 Lego subthemes.
I did the same for the Top 10 Themes, adding "'Top 10 Themes": is not blank' as a filter on the visual, so the visual only showed the top 10 Lego themes.
Finally, I added "Decade" as a filter for my slicer visual, so people can choose which decade they want to examine.
Users can use the Decade slicer to go through the different decades and see the most popular LEGO themes.
The top 10 LEGO themes of all time are ranked in this order:
The top LEGO themes for each decade are as follows:
1970s: LEGOLAND
1980s, 1990s: Town
2000s, 2010s, 2020s: Gear
--
The top 10 LEGO subthemes of all time are ranked in this order:
The top LEGO subthemes for each decade are as follows:
1970s, 1980s: Classic
1990s: Technic
2000s: Product Collection
2010s, 2020s: Magazine Gift
I explored the Harry Potter LEGOs to see if there were more sets released around the movie release years. I found that the years with the most Harry Potter LEGO sets were years with movie or TV show releases:
2002: Harry Potter and the Chamber of Secrets movie is released (17 LEGO sets)
2021: Harry Potter 20th Anniversary: Return to Hogwarts TV special comes out (17 LEGO sets)
2004: Harry Potter and the Prisoner of Azkaban movie is released (12 LEGO sets)
I also found a few interesting anomalies:
I was surprised that there were only 2 Harry Potter LEGO sets in 2003, even though it was sandwiched between the release of Chamber of Secrets (2002) and Prisoner of Azkaban (2004)
I was surprised that there was only 1 Harry Potter LEGO set released in 2007, even though that was the release year for Order of the Phoenix.
In 2007, the only Harry Potter LEGO set was the Hogwarts Castle, which was still pretty cool:
I found that the number of Star Wars LEGO sets per year was correlated with the movie release dates. The years with the most number of Star Wars LEGO sets are also years with movie releases:
2018: right before release of Episode IX - The Rise of Skywalker (70 sets)
2017: release of Episode VIII - The Last Jedi (66 sets)
2015: release of Episode VII - The Force Awakens (64 sets)
There is a spike in the number of sets for the years that Episode I, Episode II, and Episode III were released (1990, 2002, and 2005, respectively.)
Interesting facts to note:
The number of LEGO sets dipped dramatically in between the release years for Ep I, II, and III. After the release of Episode I - The Phantom Menace, the number of LEGO sets dipped from 19 (in 2000) to 10 (in 2001), almost dropping in half. Similarly, after the release of Episode II - Attack of the Clones, the number of sets dipped from 24 (in 2002) to 15 (in 2004). Again, after the release of Episode III - Revenge of the Sith, the number of sets dipped from 26 (in 2005) to 14 (in 2006.)
The number of LEGO sets dropped dramatically after 2019 and the release of Episode IX: The Rise of Skywalker. In 2021, there were only 38 sets released. Unfortunately, we do not have the sales information, which may help us figure out why LEGO decided to release fewer Star Wars sets. Perhaps this was due to low Star Wars LEGO sales in 2019.
If we compare the Rotten Tomatoes ratings of the last few Star Wars movies, maybe we can figure out why LEGO decided to release fewer sets:
Episode VII - The Force Awakens (2015) scored 93% on Rotten Tomatoes.
Episode VIII - The Last Jedi (2017) scored 91% on Rotten Tomatoes.
Episode IX - The Rise of Skywalker (2019) scored only 51% on Rotten Tomatoes.
Perhaps the lower ratings of the last movie led to fewer LEGO sales, which resulted in LEGO producing fewer Star Wars sets after 2019.
There was some correlation between the number of Bionicles LEGO sets over time and the movie release dates. The years with the most Bionicles LEGO sets were also the movie release years, or directly before/after:
In 2004, Bionicle 2: Legends of Metru Nui was released, and there were 55 LEGO sets.
In 2005, Bionicle 3: Web of Shadows was released, and there were 46 LEGO sets.
Directly following the release of the last Bionicles movie in 2009, there was a sudden drop in LEGO sets (from 30 in 2009 to 6 in 2010!) In 2014, the number of Bionicles sets dipped to just 1.
However, in 2015, interestingly enough, there was a spike in Bionicles sets to 22.
If I were conducting further research, I would look into the following:
Whether Harry Potter LEGO set numbers were correlated with the popularity of the books and movies.
Whether Harry Potter, Star Wars, and Bionicles LEGO set numbers were correlated with Rotten Tomatoes ratings of the movies (e.g. if the movies scored higher on Rotten Tomatoes, LEGO pushed more sets)
The number of LEGO sales in each of those areas for the movie release years (Harry Potter, Star Wars, Bionicles) and whether certain promotions were more successful in terms of sales than others.
It seems that there is definitely a correlation between the number of LEGO sets released for each franchise and the date of the movie/TV show releases.