__STYLES__
On this project inspired by my love for movies, I decided to compare Action and Adventure movies because they are the most diverse genres and my younger self always imagined himself uncovering mysteries, going at full speed, and fighting the bad guys.
Two main datasets were used for this analysis: movies1 and movies2, and the join between them was updated to fill in relevant missing information for some movies. You can download all raw and cleaned csv datasets on my GitHub public repository.
The folder contains the initial raw datasets (movies1, movies2), the final data joined of top movies (FinalTop), and the filtered top 100 dataset of action and adventure movies based on ROI (Return of investment) used in this analysis (top100 action vs adventure).
The joining and cleaning of movies1 and movies2 raw datasets were performed in SQL, obtaining the FinalTop dataset with the following variables: Title, Genre, Year, Production Cost, Worldwide Gross, Runtime, Avg. Rating; an additional descriptive variable (Cost range) was created as a factor with four levels based on the Production Cost data (50M$ to <100M$ = Low; 100M$ to <150M$ = Medium; 150M$ to <200M$ = High; ≥200M$ = Very High). Afterward, the Return of investment (ROI) was calculated using Excel and the data was filtered to extract the top 100 action and adventure movies based on ROI.
The SQL Scripts are available in the same GitHub repository.
The initial data analysis in Tableau focused on: 1) finding trends and/or differences related to cost, profits, and gross sales, 2) distribution of movies based by Production Cost, and 3) analyzing data variability for the main variables. This constituted the first part of this analysis and, therefore, the first dashboard page.
Another important part of this analysis was assessing the significance of both Genre and Cost Range in the movie Ratings. For this, a Two-Way Analysis of Variance (ANOVA) was performed using the variables Genre and Cost Range as factors. Additionally, Pearson’s coefficients were calculated to compare the linear correlation between all variables, and the variables with the highest coefficients were plotted to build the respective linear regression model.
These findings were digested and presented on the second dashboard page. You can also check the R scripts and outputs in the repository.
For the color palette, I chose dark blue as the dominant color, orange for contrast, and white for details. Also, I decided to arrange the visuals two dashboards to avoid oversaturation (you can jump between the pages using the arrow in the top right corner).
The first dashboard showcases the comparison between top action and adventure in terms of financial data, and the second dashboard is oriented towards the statistical analysis results.