__STYLES__

UK Rail Travel Patterns Explored

Tools used in this project
UK Rail Travel Patterns Explored

Power BI

About this project

Goal

The task was to play the role of a BI developer for National Rail, a company that provides business services to passenger train operators in England, Scotland, and Wales. The manager asks to create an exploratory dashboard that helps them:

  • Identify the most popular routes
  • Determine peak travel times
  • Analyze revenue from different ticket types & classes
  • Diagnose on-time performance and contributing factors

Data

The data contains mock train ticket sales for National Rail in the UK, from January to April 2024, including details on the type of ticket, the date & time for each journey, the departure & arrival stations, the ticket price, and more.

Each row represents a passenger. The dataset contains about 31’000 rows, meaning that in four months 31’000 passengers were recorded using the UK trains. Yes, this doesn't sound like a lot. Some Online Research shows that about 1.7 billion passengers used the trains in the UK pre-pandemic. Around 15% of these account for long-distance journeys as we have in the dataset. This means in reality around 21 Mio. Passengers are using long-distance trains per month. So, we only look at a fraction of the actual data. Another funny fact about the dataset is that, on average, fewer than two persons were on each train.

From the very beginning it was clear to me to show the data on a map and over time. That’s why I added two more tables to the data model: a date table and a table with the coordinates of each rail station in the dataset.

Data Cleaning

Only a little data cleaning had to be made: In the original dataset the station Edinburgh has two different names as have three delay reasons.

I added a new column for the Train Route (Departure Station – Arrival Station) and a new column for the unique train ID (Departure Date + Departure Time + Train Route).

Moreover, I shortened the names of the train stations to better use the space in the report layout. For all cities but London I just communicate the city name (e.g. “Liverpool” instead of “Liverpool Lime Street”) as they only have one main station. For the London stations, I kept the station name but replaced “London” with “LDN” to get at least a little bit more space on the report.

Report Structure

The main goal is to create an exploratory dashboard. An exploratory dashboard allows users to explore data. This is why the report has a lot of details to give each user the possibility to answer their individual data questions. The report does not include any filters. However, cross-filtering is activated which allows the users to filter by different categories (e.g., departure time, weekday, ticket type or the journey status).

Based on the defined goals, I identified the insights the report should generate in order to better determine the visuals needed to support them:

  • Identify the most popular routes. Insight: Increase services on these routes like snacks and beverages in trains or more shops on train stations. Charts: Table visual with Train Routes by No. of passengers

  • Determine peak travel times. Insight: Adjust train frequency and staffing. Charts: Line Chart of Avg. Daily Passengers by 60 mins / 30 mins / 15 mins; Heatmap of Passenger Count by Hour and Weekday

  • Analyze revenue from different ticket types & classes. Insight: Adjust pricing and target marketing campaigns for ticket types / ticket holders. Charts: Stacked Bar Charts with Revenue by Ticket Type, Ticket Class and Railcard Holder

  • Diagnose on-time performance and contributing factors. Insight: Identify routes and times with delays and address root causes, such as infrastructure issues or scheduling conflicts. Charts: Heat Map of share of delays by Hour and Weekday; Decision Tree to analyze the reasons for delays and how many refunds were requested; Card visual to show delay times

As mentioned above the report has visuals with many details. To reduce the flood of information I gave the report four pages – one for each goal while the page with the answer to the first goal “Identify the most popular routes” also serves as an Overview page.

To engage the user and to make him very quickly familiar with the train routes the Overview page has a large map visual. Another reason is that the page contains a table visual with a lot of data. This reduces the amount of data the user is confronted with. The table holds the relevant information to analyze each route:

  • No. of passengers to identify the popular routes
  • Revenue to identify the routes that generate the most/least revenue
  • Share of Delays (a delay is a train with status “Delayed” or status “Cancelled”) to identify the routes with low on-time performance
  • No. of trains to better grasp the % of delays
  • The average passenger per train to analyze how fully utilized specific train routes are

Note: The bottom of the table shows the total number for each of the KPI’s. In my opinion this makes the use of KPI cards obsolete.

A word in general: As we look at just four months of data and no goals were provided the values are not shown with context. The next step would be to define realistic goals with the manager.

The next three pages

  • Peak Travel Time
  • Revenue
  • and Delays

have one thing in common: a matrix visual (in the style of a train timetable) with the train routes in the rows and the departure times in the columns. Conditional formatting let the user quickly see the routes and times with high passengers, revenue and delays. By clicking on a specific cell in the matrix visual the other visuals on the page are filtered. Like this, the user can easily dig deep into the info of every train route and time.

On the top right of each page are easy to digest visuals like KPI cards or line charts while on the bottom are the matrix visuals with a lot of details.

Design

Colors were generated with a color palette tool and differ for each page to keep the user engaged and to define the different subjects (Overview, Revenue, Peak Travel Time, Delays) by color. The Overview page has a dark blue tone because of the dark blue logo of National Rail. But not too dark to keep the numbers in the table visual readable. To enhance user experience the report has a side panel to navigate between the pages and bookmarks to switch between three map visuals for visualization of passenger count, revenue and delay on train routes and bookmarks for three time slots for travel time analysis.

Insights

The Overview page shows us – either from the map or the table visual – that the train route with the most passengers (Manchester to Liverpool) is by far not the route that generates the most revenue (which is London Kings Cross to York). Luckily, these two routes have a low delay rate (delays are routes with status “Delayed” or “Cancelled”).

However, the route Liverpool to London Euston with the second highest revenue has a delay rate of over 80%. Besides, some lesser frequented routes like Edinburgh to London Kings Cross has a 100% delay rate. Although the other way around, only 5% of the trains from London Kings Cross to Edinburgh have a delay.

The most packed train is the 8 AM train from Liverpool to London Euston.

The peak travel time page shows us that there are passenger peaks at rush hour (6-8 AM and 4-6 PM). A closer look at the matrix visual shows us that the numbers for the 8 AM time slot are skewed due to one outlier: the previous mentioned most packed route Liverpool to London Euston. Interesting is, that the time peaks also show at the weekends.

Revenue is nearly evenly distributed between the three ticket classes “Anytime”, “Off-Peak” and “Advance” with “Advance” tickets generating the most revenue in the four months. This can be traced back to the month of February where around 30% more revenue was generated compared to the other three months while revenue for the other two ticket types was halved.

The main reason for delays or cancellations of trains were weather conditions (maybe snow because of the winter months?). The highest share of delayed trains was between 8-9 AM.

Another interesting analysis that didn’t made the report is the analysis of the time of ticket purchase: „Anytime“ and „Off-Peak“ tickets are always bought on the same day as train departure and around 4 of 5 “Advance” tickets are bought just one day before train departure date.

Additional project images

Discussion and feedback(2 comments)
comment-1867-avatar
Mainak Ghosh
Mainak Ghosh
22 days ago
A detailed information beginning with setting up the project goal, proper data cleaning steps & then the effective reporting. Really loved the Map visuals of the train routes. Keep up the amazing work, Michael!
2000 characters remaining
Cookie SettingsWe use cookies to enhance your experience, analyze site traffic and deliver personalized content. Read our Privacy Policy.