Railway Insight: Exploring UK Train Journey Data

The Dataset:

The dataset contains mock train ticket data for National Rail in the UK, covering journeys from January to April 2024. It includes information such as transaction details, purchase type, ticket class, departure and arrival stations, journey dates and times, prices, journey statuses, reasons for delay, and refund requests.


    1. Identify the Most Popular Routes


 Departure Station    Arrival Destination  Journey Count
40  Manchester Piccadilly  Liverpool Lime Street           4628
24          London Euston  Birmingham New Street           4209
30     London Kings Cross                   York           3922
35      London Paddington                Reading           3873
36      London St Pancras  Birmingham New Street           3471
22  Liverpool Lime Street  Manchester Piccadilly           3002
19  Liverpool Lime Street          London Euston           1097
25          London Euston  Manchester Piccadilly            712
6   Birmingham New Street      London St Pancras            702
34      London Paddington                 Oxford            485


  • The most popular route is from Manchester Piccadilly to Liverpool Lime Street with 4628 journeys.
  • Other highly frequented routes include London Euston to Birmingham New Street and London Kings Cross to York.
  • The popularity of routes tends to center around major cities and important transit hubs.

2. Determine Peak Travel Times


Hour of the Day Number of Journeys
0           853
1           644
2           942
3           543
4           1041
5           725
6           3112
7           2795
8           2179
9           1230
10          525
11          1143
12          773
13          1276
14          855
15          1220
16          2301
17          2888
18          3113
19          438
20          1058
21          570
22          788
23          641


  • Peak travel times are early in the morning (6 AM and 7 AM) and in the late afternoon to early evening (5 PM to 6 PM).
  • There are notable peaks at 6 AM (3112 journeys) and 6 PM (3113 journeys), suggesting these times are the busiest for travel.
  • Late-night and early morning hours (0-4 AM) have significantly fewer journeys, indicating less travel during these times.

3. Analyze Revenue from Different Ticket Types and Classes


Ticket Type Ticket Class    Total Revenue
Advance     First Class     66886
Advance     Standard        242388
Anytime     First Class     37841
Anytime     Standard        171468
Off-Peak        First Class     44672
Off-Peak        Standard        178666


  • Standard class tickets generate significantly more revenue than First Class tickets across all ticket types.
  • Advance tickets bring in the highest revenue, particularly in the Standard class ($242,388).
  • Off-Peak and Anytime tickets also contribute a substantial amount of revenue, but less than Advance tickets.

4. Diagnose On-Time Performance and Contributing Factors


Journey Status      Frequency
On Time         27481
Delayed         2292
Cancelled           1880

Reason for Delay        Frequency
Unknown         27481
Weather         1372
Signal Failure          970
Staffing            809
Technical Issue         707
Traffic         314


  • A large majority of journeys (27,481) are on time.
  • The most common reasons for delays include weather conditions, signal failure, staffing issues, technical problems, and traffic.
  • Delayed journeys (2292) and canceled journeys (1880) are relatively fewer compared to on-time journeys but still significant enough to warrant attention.
  • The high number of 'Unknown' reasons for delays suggests a potential area for improved data collection or categorization.

Additional Analyses

5. Analyze Payment Methods


Payment Method    Frequency
Contactless       10834
Credit Card       19136
Debit Card        1683


  • Credit cards are the most common payment method used, followed by contactless payments and debit cards.
  • This indicates a strong preference among passengers for using credit cards over other payment methods.

6. Analyze Refund Requests


Refund Request    Frequency
No                30535
Yes               1118


  • The majority of passengers (30,535) did not request refunds, while a smaller portion (1,118) did.
  • This suggests that while there are issues leading to refund requests, the overall satisfaction level might be relatively high given the lower number of refund requests compared to total journeys.


These analyses effectively address the original questions by identifying key trends and insights:

  • Most Popular Routes: Major transit routes between large cities are the most frequented.
  • Peak Travel Times: Morning and evening hours are the busiest, aligning with typical work and commuting hours.
  • Revenue Analysis: Standard class and Advance tickets generate the highest revenue, indicating passenger preferences and cost savings.
  • On-Time Performance: Majority of journeys are on time, but specific issues like weather and signal failures are common delay factors.
  • Payment Method Analysis: Credit cards are the preferred payment method, indicating passenger behavior in payment preferences.
  • Refund Request Analysis: A relatively small number of refund requests suggest overall passenger satisfaction with the service.
Jamie Scott
about 1 month ago
Nice! Newspaper-like style
