__STYLES__

Analyzing 120 Years of Olympics History with SQL

Tools used in this project
Analyzing 120 Years of Olympics History with SQL

About this project

THE DATA

The data comes from KAGGLE This data set contains about 0.2 million records with 15 columns. In this Project I am going to use the SQL (Structured Query Language) as a tool to analyze the dataset. This is a historical dataset on the modern Olympic Games, including all the Games(Summer/Winter) from Athens 1896 to Rio 2016.

Note that the Winter and Summer Games were held in the same year up until 1992. After that, they staggered them such that Winter Games occur on a four year cycle starting with 1994, then Summer in 1996, then Winter in 1998, and so on. A common mistake people make when analyzing this data is to assume that the Summer and Winter Games have always been staggered.

Also This dataset doesn't have any specific column for the medal which has been won by Group of Athlete like when hockey team won the Gold medal it count the medal to all the athlete who were in the team so the exact medal count will vary upon this when we are counting the Medal.

I have analyzed the the Olympics data then focused my analysis on the Main Business Question which I have explored are below:

1. How many total Olympic Games held (including Summer and Winter).

2. Total No. of Countries Participated in each Summer/Winter Olympics.

3. How many Athletes have participated in Olympics.

4. Sex-wise Athletes participation in Olympics.

5. How many Athletes won any medal.

6. Which year saw the highest and lowest no of countries participating in Olympics.

7. Which Nation has participated in all of the Olympic games?.

8. Identify the sport which was played in all summer Olympics.

9. Which Sports were just played only once in the Olympics?.

10. Fetch the total no of sports played in each Olympic games.

11. Fetch the top 5 most successful countries in Olympics. Success is defined by no of medals won.

12. List down total gold, silver and bronze medals won by each country.

13. List down total gold, silver and bronze medals won by each country corresponding to each Olympic games.

14. Fetch details of the oldest athletes to win a gold medal.

15. Fetch details of the youngest athletes to win a gold medal.

16. Fetch the top 5 athletes who have won the most gold medals.

17. Fetch the top 5 athletes who have won the most medals (gold/silver/bronze).

18. Which countries have never won gold medal but have won silver/bronze medals?

Now let's dig deep in each of the Business Question mentioned above.

1. How many total Olympic Games held (including Summer and Winter).


    SELECT 
         COUNT(DISTINCT games)
    FROM "athlete_events.csv" 
       order by 1 desc;

So there are total 51 Olympics Games has been held till 2016 Summer Olympics including all summer and Winter Olympic starting from 1896(Athens) till 2016(Rio) Olympics.

2. Total No. of Countries Participated in each Summer/Winter Olympics.


       SELECT 
            games,
           COUNT(distinct team) as Participated_Country_Count
       FROM "athlete_events.csv" 
           GROUP BY 1
       order by 1 desc

Highest number of Countries participated in 2008 Summer(292) while minimum was 1896 Summer(18).

3. How many Athletes have participated in Olympics.


     SELECT 
            count(distinct id)
     FROM "athlete_events.csv"

There are total 1,35,571 athlete have participated in this event over 120 years.

4. Sex-wise Athletes participation in Olympics.


     SELECT
                 CASE
                   WHEN sex = 'F' THEN 'Female'
                   WHEN sex='M' THEN 'Male' 
                END as Sex,
                COUNT(distinct id) as "Total Athlete",
               CONCAT(ROUND(COUNT(distinct id)*100.0/
                   (select COUNT(distinct id) from "athlete_events.csv"),2),'%') as "Percentage of athlete"
       FROM "athlete_events.csv" 
             GROUP BY 1
       order by 2 desc

There are about 75% of male athlete have participant which shows the clear dominance of male athlete in Olympic Games

5. How many Athletes won any medal.


    WITH CTE AS
        (
           SELECT
                 id,medal
          FROM "athlete_events.csv"
                 where medal is NOT NULL
         GROUP BY 1,2
       )
      SELECT COUNT(*) FROM CTE; 

There are total 33,745 athlete who won any medal in the Olympic Games which is about 25 % of total athletes.

6. Which year saw the highest and lowest no of countries participating in Olympics.


     (SELECT
                games, COUNT(distinct team) as Max_and_Min_Participated_Country_Count
      FROM "athlete_events.csv" 
              GROUP BY 1
       order by 2 desc
              LIMIT 1)
  UNION
      (SELECT
                games, COUNT(distinct team) as Max_and_MinParticipated_Country_Count
       FROM "athlete_events.csv" 
               GROUP BY 1
       order by 2
        LIMIT 1)

Athens Summer Olympic in 1896 has minimum country participation while Beijing Summer Olympics has maximum country participation 2008.

7. Which Nation has participated in all of the Olympic games?.


      WITH CTE AS 
       (
                     SELECT 
                            team,games,
                            dense_RANK() OVER(partition by team order by games) as dr
                    FROM "athlete_events.csv" 
       )
      SELECT 
              distinct team
      FROM CTE 
             where dr=(SELECT COUNT(distinct games) from "athlete_events.csv" )

There are total 5 countries which have participated in all the Olympics (France, Great Britain, Switzerland, Italy)

8. Identify the sport which was played in all summer Olympics.


     WITH CTE AS
     (
             SELECT
                       sport,games,
                       DENSE_RANK() OVER(partition by sport order by games) as dr
             FROM "athlete_events.csv" as tab1
                      WHERE season='Summer'
             group by games,sport
       )
    SELECT 
          sport
    FROM CTE 
               WHERE dr=(SELECT COUNT(distinct games) FROM "athlete_events.csv" WHERE season='Summer')

Athletics, Cycling, Fencing, Gymnastics and Swimming are the only Sport played over all the Olympic Games.

9. Which Sports were just played only once in the Olympics?.


    SELECT 
            sport, games
    FROM "athlete_events.csv"
            WHERE 
    SPORT in (SELECT sport FROM "athlete_events.csv" GROUP BY 1 
                          HAVING  COUNT(DISTINCT games)=1)
    GROUP BY 1,2

There are total 11 Sports which are played once in Olympic Games.

10. Fetch the total no of sports played in each Olympic games.


     SELECT 
              games,    COUNT(DISTINCT sport) as total_games_count
     FROM "athlete_events.csv" 
            GROUP BY 1
     order by 2 desc

There are highest 34 games has been played in summer Olympics of (2000,2004,2008,2016) while lowest 7 games has been played in (1932 winter)

11. Fetch the top 5 most successful countries in Olympics. Success is defined by no of medals won.


     SELECT 
                 team, COUNT(medal) as medal_count
      FROM "athlete_events.csv" 
               GROUP BY team
    order by medal_count desc
                LIMIT 5

US is having the Highest number of medals.

12. List down total gold, silver and bronze medals won by each country.


     SELECT 
              team,
             COUNT(CASE WHEN medal='Gold' THEN 1 ELSE NULL END) as gold_medal_count,
             COUNT(CASE WHEN medal='Silver' THEN 1 ELSE NULL END) as silver_medal_count,
             COUNT(CASE WHEN medal='Bronze' THEN 1 ELSE NULL END) as bronze_medal_count
     FROM "athlete_events.csv" as tab1
             GROUP BY team
     order by gold_medal_count desc,silver_medal_count desc,bronze_medal_count desc

13. List down total gold, silver and bronze medals won by each country corresponding to each Olympic games.


     SELECT 
              games,
              team,
             COUNT(CASE WHEN medal='Gold' THEN 1 ELSE NULL END) as gold_medal_count,
             COUNT(CASE WHEN medal='Silver' THEN 1 ELSE NULL END) as silver_medal_count,
             COUNT(CASE WHEN medal='Bronze' THEN 1 ELSE NULL END) as bronze_medal_count
     FROM "athlete_events.csv" as tab1
             GROUP BY games, team
     order by games, gold_medal_count desc,silver_medal_count desc,bronze_medal_count desc

This is having the Medal count of countries in each of the Olympics Games held.

14. Fetch details of the oldest athletes to win a gold medal.


    WITH CTE AS
     (
        SELECT
                     *,DENSE_RANK() OVER (order by age desc) as dr
         FROM "athlete_events.csv" as tab1
                     WHERE medal ='Gold' and age is not null
      )
      SELECT 
               name, sex, age, team,
               games, season, sport, event
      FROM CTE 
                WHERE dr=1

There are two Athlete who won the Gold Medal at the age of 64 and Both are Male while one from Shooting while other is from Roque.

15. Fetch details of the youngest athletes to win a gold medal.


  WITH CTE AS
   (
               SELECT 
                        *,DENSE_RANK() OVER (order by age) as dr
               FROM "athlete_events.csv" as tab1
                      WHERE medal ='Gold' and age is not null
      )
    SELECT 
            age, sex, name, team, games, sport, event
   FROM CTE WHERE dr=1

There are Total 7 athlete who won the Gold medal at the age of 13 where 5 out of 7 was Female.

16. Fetch the top 5 athletes who have won the most gold medals.


    WITH CTE AS
       (
          SELECT id, name, team, COUNT(*) as cnt
                      FROM "athlete_events.csv" as tab1
          where medal ='Gold'
                    GROUP BY id,name,team
         ORDER BY 4 desc
     ),
    CTE1 AS
    (
              SELECT
                        *,DENSE_RANK() OVER(order by cnt desc) as dr
              FROM CTE 
   )
  SELECT 
             id, name, team, cnt as "Total Gold medals" 
  FROM CTE1
        WHERE dr<=5

Michael Phelps won the Highest 23 Gold medal.

17. Fetch the top 5 athletes who have won the most medals (gold/silver/bronze).


    WITH CTE AS
       (
          SELECT id, name, team, COUNT(medal) as cnt
                      FROM "athlete_events.csv" as tab1
           GROUP BY id,name,team
                    ORDER BY 4 desc
     ),
    CTE1 AS
    (
              SELECT
                        *,DENSE_RANK() OVER(order by cnt desc) as dr
              FROM CTE 
   )
  SELECT 
             id, name, team, cnt as "Total medals" 
  FROM CTE1
        WHERE dr<=5

18. Which countries have never won gold medal but have won silver/bronze medals?


           SELECT
                  distinct team 
          FROM "athlete_events.csv" 
                 where team IN (SELECT distinct team FROM "athlete_events.csv" WHERE medal IN('Silver', 
                 'Bronze')  
  EXCEPT 
          SELECT 
                  distinct team
          FROM "athlete_events.csv"
                 WHERE medal IN('Gold')
   ) 
  order by 1

There are total 256 Countries who have not win any Gold medal while won silver or Bronze

INSIGHTS

After looking at the data, I noticed that The Olympics held from 1896 till 2016 is having the 75% of male's athlete participation than 25% of Female's. US country has won the highest number of medal won, whereas Michael Phelps is the athlete who won the highest number of individual medal. The youngest athletes(5 male and 2 female) who won the gold medal are of age 13 and the oldest athletes(2 male athlete) of age 64.

Additional project images

Maximum Gold medal won by athlete
Sports Played in all Olympics
Top 5 Countries by medal Count
Youngest Athlete to Won Gold Medal
Data Snapshot
Played once  in whole Olympics
Discussion and feedback(0 comments)
2000 characters remaining
Cookie SettingsWe use cookies to enhance your experience, analyze site traffic and deliver personalized content. Read our Privacy Policy.