__STYLES__

Comparative Analysis: top 100 "Action vs Adventure" (SQL + Excel + R + Tableau)

Tools used in this project
Comparative Analysis: top 100 "Action vs Adventure" (SQL + Excel + R + Tableau)

Tableau Dashboard (use the arrow to change page)

About this project

Check out the dashboard with the appropriate resolution in my Tableau Public page.


On this project inspired by my love for movies, I decided to compare Action and Adventure movies because they are the most diverse genres and my younger self always imagined himself uncovering mysteries, going at full speed, and fighting the bad guys.


Datasets

Two main datasets were used for this analysis: movies1 and movies2, and the join between them was updated to fill in relevant missing information for some movies. You can download all raw and cleaned csv datasets on my GitHub public repository.

The folder contains the initial raw datasets (movies1, movies2), the final data joined of top movies (FinalTop), and the filtered top 100 dataset of action and adventure movies based on ROI (Return of investment) used in this analysis (top100 action vs adventure).

Data Cleaning (SQL + Excel)

The joining and cleaning of movies1 and movies2 raw datasets were performed in SQL, obtaining the FinalTop dataset with the following variables: Title, Genre, Year, Production Cost, Worldwide Gross, Runtime, Avg. Rating; an additional descriptive variable (Cost range) was created as a factor with four levels based on the Production Cost data (50M$ to <100M$ = Low; 100M$ to <150M$ = Medium; 150M$ to <200M$ = High; ≥200M$ = Very High). Afterward, the Return of investment (ROI) was calculated using Excel and the data was filtered to extract the top 100 action and adventure movies based on ROI.

The SQL Scripts are available in the same GitHub repository.

Data Exploration and Analysis (Tableau)

The initial data analysis in Tableau focused on: 1) finding trends and/or differences related to cost, profits, and gross sales, 2) distribution of movies based by Production Cost, and 3) analyzing data variability for the main variables. This constituted the first part of this analysis and, therefore, the first dashboard page.

Statistical Analysis (R)

Another important part of this analysis was assessing the significance of both Genre and Cost Range in the movie Ratings. For this, a Two-Way Analysis of Variance (ANOVA) was performed using the variables Genre and Cost Range as factors. Additionally, Pearson’s coefficients were calculated to compare the linear correlation between all variables, and the variables with the highest coefficients were plotted to build the respective linear regression model.

These findings were digested and presented on the second dashboard page. You can also check the R scripts and outputs in the repository.

Dashboard Design (Tableau)

For the color palette, I chose dark blue as the dominant color, orange for contrast, and white for details. Also, I decided to arrange the visuals two dashboards to avoid oversaturation (you can jump between the pages using the arrow in the top right corner).

The first dashboard showcases the comparison between top action and adventure in terms of financial data, and the second dashboard is oriented towards the statistical analysis results.

Top Findings

  • There is enough evidence to claim that the average rating of adventure movies is higher than action movies (Adventure: 7.10, Action: 6.95; p-value < 0.05).
  • Most of the movies in both top 100 have a production cost between $100M-$200M (Action: 69%, Adventure: 68%).
  • Both genres present high variability in terms of Worldwide Gross and Return of Investment (ROI) across the four Cost Ranges, thus, there were no significant differences between them. Nevertheless, for both genres, worldwide gross tends to increase with cost production, whereas only adventure movies with Low production cost showed a peak in ROI compared to the rest.
  • The highest linear correlation was observed between the variables Worldwide Gross / Return of Investment (Pearson's Coeff.: 0.67), and Worldwide Gross / Product Cost (Pearson’s Coeff.: 0.65). Additionally, there was no correlation between Cost Production and ROI (Pearson's Coeff.: -0.05)
  • As per the linear regression model, the Worldwide Gross for these top movies can be estimated as 4.76 times the Production Cost (WWG = 4.76*PC; R-squared: 0.88).
  • Worldwide Gross for the top 100 adventure movies tends to be lower and more consistent over time than action movies, especially between 2010-2019. In contrast, action movies had a huge increase during the period of 2013-2019. Note: both genres had significant impact during the COVID-19 period.

What's next?

  • Based on the high variability observed in the data, it would be advisable to include the sub-genres of the movies, which usually have two or three in addition to the main one, and perform a multiple-genre analysis.
  • Pearson's correlation results indicate that a k-means clustering analysis could be useful to segment movies beyond genre.
Cookie SettingsWe use cookies to enhance your experience, analyze site traffic and deliver personalized content. Read our Privacy Policy.