__STYLES__
As a kid, I dragged my mom to every single Barnes & Noble midnight release of the Harry Potter books. I was an avid fan. I distinctly remember spending an entire vacation in Cape Cod reading the most recent release. When Maven Analytics released their Maven Magic Challenge I knew I had to participate.
This dataset consisted of 6 tables in CSV format:
As a self-proclaimed Potterhead, I wanted to better understand the magic of the Harry Potter movies. To do that, I looked at the overarching themes of the movies. I considered where the scenes took place, who had the most dialogue, what spells, charms, and curses Harry used to save the Wizarding World from Voldermort, and more. I wanted to practice not cluttering the space on my dashboard so that was a key focus during my analysis.
I wanted to gain a better understanding of the data and how each table interacts with the others. To do so, I used some simple queries with joins. I also looked at basic figures, such as total box office revenue and budget, the number of chapters per movie, the gender breakdown of characters, and more. My exploration of the data was to simply grasp what the tables were conveying and how that information might be important.
##Sample Queries
SELECT
SUM(Box_Office)/1000000000 ||'B'
, SUM(ROUND(Box_Office / Runtime,2)) AS rev_per_minute
, SUM(budget)
, SUM(Box_Office) - SUM(Budget)
FROM `bright-zodiac-346921.harry_potter.movies`
SELECT
place_name
, movie_title
, COUNT(1) AS number_of_scenes
FROM `bright-zodiac-346921.harry_potter.places` AS places
INNER JOIN `bright-zodiac-346921.harry_potter.dialogue` AS dialogue
ON dialogue.place_id = places.place_id
INNER JOIN `bright-zodiac-346921.harry_potter.chapters` AS chapters
ON chapters.chapter_id = dialogue.chapter_id
INNER JOIN `bright-zodiac-346921.harry_potter.movies` AS movies
ON movies.movie_id = chapters.movie_id
GROUP BY place_name, movie_title
ORDER BY 3 DESC
SELECT
movie_title
, runtime
, COUNT(dialogue_id) AS number_of_lines
FROM `bright-zodiac-346921.harry_potter.dialogue` AS dialogue
LEFT JOIN `bright-zodiac-346921.harry_potter.chapters` AS chapters
ON chapters.chapter_id = dialogue.chapter_id
LEFT JOIN `bright-zodiac-346921.harry_potter.movies` AS movies
ON movies.movie_id = chapters.movie_id
GROUP BY 1,2
ORDER BY 3 DESC
As mentioned, I wanted to focus on not cluttering the dashboard with unnecessary charts. I focused on a few different areas including the breakdown of characters per Hogwarts house, the number of lines of dialogue per character, the types of spells Harry uses most often and where, as well as the number of scenes in particular places.
I was particularly interested to see if these things differed per movie. The general consensus is that the Harry Potter books and movies grow progressively darker. I was curious if that would be reflected in the data as well. While not particularly obvious, the number of scenes in place like Hogwarts decreased as the movies progressed.
Based on my analysis and unsurprisingly, Harry Potter was the star of the movie. He had the most lines of dialogue by far (almost double that of any of the other characters), his Hogwarts house is most heavily represented, and he uses the most spells.
I tested my hypothesis that, as the movies progressed, we would see the increasing darkness represented in the data; however, this did not appear to be reflected in the dataset. This could be due to vague categories (i.e. the number of scenes at Hogwarts decreased over the movies but the other settings were too vague to judge if they, in fact, got more dangerous).