__STYLES__

An Exploration of the Tour de France Data

An Exploration of the Tour de France Data

About this project

Summary:

We were given 5 tables of data from the past Tour de France races. The data range was from the year of 1903 to 2022. From this data we had to figure out questions to answer that would meet the criteria of educating new viewers, highlighting interesting facts of the race to promote anticipation in this year's race events. After questions were established, we had to present the answers in a clear way with the data in a dashboard or report.

Questions:

After reviewing some entries into the challenge and some websites on the Tour de France, it was decided that unique questions should be presented that can not easily be answered with a quick web search. The thought process behind the uniqueness factor was that a client would want something new, out of the box, to catch the attention of a new and an old target audience to expand the event. The following questions were developed:

Highlight magnitude of past races
  1. From a historical point of view, which stages would be the ones to attend?
Educate new viewers
  1. What is the life span of the riders? (This question arose after finding out some professional athletes actually end up with a shorter life span because the body is not designed for that kind of abuse. )

  2. With riots in France and other wars in the world in the news, is it safe to attend these events?

build anticipation for this years race
  1. What is the expected average speed of the riders for this year?

Things considered while developing this report:

The principle of working at different levels, separating the data modeling and the visualization process into two different levels or layers, was implemented. While reviewing the data to establish questions that could be answered with the data, it became clear that some data had to be rearranged and calculated columns needed to be added to tables to answer the aforementioned questions. It was decided best to do this in Power Query to allow optimization of refreshing the visuals and simplifying DAX equations in measures used for the visualizations. (Please compare pictures of "Original Tables" and "Data Model Used".)

Note: To build anticipation for this year's race, present up to date data on this year's race needed to be included in the data model. That is the "2023 Tours" table seen in the "Data Model Used" picture. It is a table from a website that was imported using Power Query.

Another thing that needed to be considered is the creation of the bar(histogram) chart on the first page of my report. I could plot all the data points for life spans, but the histogram would not be clear. The visual would treat life span data as a number and not as a category. Many spaces would occur between the bars. I could have used the selected "group" from a drop down menu in the visualization pane, but sorting issues could arise from that approach and selection of bins is limited. In this case, no filtering was intended to be used with this visual. Therefore, I chose to use Power Query to create an independent table for this visual with an index column to sort on.

Learning lessons:

Eager to use my new skills of programming with python, I tried to first clean up the data and import it into Power Query. However, I ran into the problem of using parameters to easily change the location of the data files. To avoid this problem, I found it best practice to import the file into Power Query and then insert a step to use python scripts to clean the data. This approach still keeps the cleaning process documented in Power Query and allows you to use the best of two worlds depending upon which programming language, python or M, is easier to do the task at hand.

Additional project images

Discussion and feedback(0 comments)
2000 characters remaining