__STYLES__

SQL & Excel // Cancer Mortality in Israel and Worldwide

Tools used in this project
SQL & Excel // Cancer Mortality in Israel and Worldwide

About this project

Project code:

Click here to view this project on my GitHub.

Project objective:

Explore cancer mortality rates around the world:

  • Querying data by country / year / cancer type
  • Comparing different countries
  • Looking at mortality rates across time
  • Finding the cancer types that have the highest mortality rates in different countries/years

The data:

I started with a dataset of cancer death counts, by country, year (1990-2016) and cancer type (20 types).

The first problem I encountered was that the numbers were death counts, and since my goal was to compare mortality rates across countries and time, of course I had to take into account population size. I found a dataset which contained population counts by country and year, and then I was able to start working.

Prepping the data in Excel:

The second problem was that the cancer death table was pivoted – each cancer type data was in a different column, instead of a different row. Tableau has a great unpivotting feature, but since it doesn't actually alter the source files, it couldn't help me here. I unpivotted tables in SQL in the past, but here, with 20 columns to unpivot, I looked for a different solution, and eventually used Excel and ~10 minutes of manual work :)

Third problem – the population dataset was pivoted as well, but unlike the cancer dataset, here it wasn't just 20 columns, but 170... so manual cut-and-paste wasn't an option. Instead, I copied the population table to the cancer Excel file and used VLookup() to get the right population size in each row. Since I needed to lookup not only for the right year (lookup by row) but for the right country as well (lookup by column), I combined VLookup() with Match() – it looked like this:

= VLOOKUP (C2, PopulationTable, MATCH (A2, PopulationTableHeader, 0), false)

It was my first time ever combining VLookup with Match, and I loved how it worked like magic!

Fourth problem – so yes, VLookup and Match worked great and most countries got their population sizes, but many didn't, simply because their names were different in the cancer and population tables. There was no easy fix for that – I had to go one by one and check what the problem was. Sometimes it was just due to different spelling, but sometimes I had to look up these countries and find their alternate names, change them in one of the tables, and then the VLookup-Match functions were able to take it from there.

Once this was done, I had one table with both the cancer death counts and population sizes, but I decided to break it up to two, so that I'll be able to practice Join on SQL.

Data exploration in SQL Server:

I loaded the tables to SQL Server and started querying.

First things first, joining the cancer and population tables and calculating the mortality rate per capita. This query results were going to be the basis for the rest of the project, so I turned them into a View. I love how it made all the next queries so much more readable and clear, than if I used a subquery instead.

When the View was ready, I wrote various queries to explore the data, such as:

  • Filtering by country and/or year and/or cancer type.
  • Ranking countries by mortality rate, and looking specifically at Israel's rankings.
  • Finding the five cancer types with the highest rates, in Israel and worldwide, by year.

Other queries helped me answer specific questions that arose when looking at the previous queries' results, such as:

  • When did pancreatic cancer first became one of the five cancer types with the highest mortality rates in Israel?
  • In which years was the mortality rate of breast cancer highest and lowest in Israel?
  • How many people die of cancer each year (on average) worldwide?
  • How many people died of cancer in Israel each year, and what percentage is it of the entire population?

One of the things these queries showed, is that liver cancer has one of the highest mortality rates worldwide (it's among the top five), but in Israel it's only at number 12-14. It might be useful to understand what causes this positive difference in Israel.

Technical skills used in this project:

  • In Excel: data preparation, VLookup and Match functions.
  • In SQL: Joins, Views, Subqueries, Window Functions, Aggregate Functions and more.

What I learned:

  • Even though it's always great fun to find smart ways to do things, sometimes manual work is the best approach, and it doesn't even take as much time as one would think.
  • Combining the VLookup() and Match() Excel functions is powerful.
  • Creating and using Views is so easy, and they improve readability and debugging.
  • All these tables and numbers and queries and code mean that each year, over 8 million people die of cancer worldwide. I wish we'll get to see this number drop significantly as soon as possible.

Additional project images

Discussion and feedback(0 comments)
2000 characters remaining
Cookie SettingsWe use cookies to enhance your experience, analyze site traffic and deliver personalized content. Read our Privacy Policy.