__STYLES__

Google Data Analytics - Case Study - Cyclistic Bike Share

Tools used in this project
Google Data Analytics - Case Study - Cyclistic Bike Share

About this project

Google Data Analytics Capstone: Complete a Case Study

Introduction

Welcome to the Cyclistic bike-share analysis case study! In this case study, I will perform many real-world tasks as a data analyst. I will be working for a fictional company, Cyclistic, and meet different characters and team members. To answer the key business questions, I will be following the steps of the data analysis process: Ask, Prepare, Process, Analyze, Share, and Act.

Scenario

I am a data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, my team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, my team will design a new marketing strategy to convert casual riders into annual members.

Characters and Teams

  • Cyclistic: Cyclistic's bike-share program includes 5,800+ bikes and 600 docking stations. It stands out for inclusivity with reclining bikes, hand tricycles, and cargo bikes, catering to people with disabilities. While 92% prefer traditional bikes, 8% use assistive options. Popular for leisure, 30% use Cyclistic bikes for daily commutes.
  • Lily Moreno: The director of marketing and my manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels.
  • Cyclistic marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy. I joined this team six months ago and have been busy learning about Cyclistic’s mission and business goals — as well as how I can help Cyclistic achieve them.
  • Cyclistic executive team: The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.

About the company

In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station.

Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as Casual Riders. Customers who purchase annual memberships are Cyclistic Members.

Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign targeting new customers, Moreno believes there is a good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.

Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. To do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.

Ask

Three questions will guide the future marketing program:

  1. How do annual members and casual riders use Cyclistic bikes differently?

  2. Why would casual riders buy Cyclistic annual memberships?

  3. How can Cyclistic use digital media to influence casual riders to become members?

I have been assigned the first question to answer: How do annual members and casual riders use Cyclistic bikes differently?

I will be producing a report with the following deliverables:

  1. A clear statement of the business task

  2. A description of all data sources used

  3. Documentation of any cleaning or manipulation of data

  4. A summary of my analysis

  5. Supporting visualizations and key findings

  6. Recommendations and areas of improvement based on my analysis

Prepare

Data Collection

I will be using Cyclistic’s historical trip data for the year 2022 to analyze and identify trends. The data has been made available by Motivate International Inc. under this license.

There are a total of 12 files, each adhering to the naming format YYYYMM-divvy-tripdata. Each file encompasses data for a specific month of year 2022, including details like ride ID, bike type, ride start time, ride end time, ride start station, ride end station, ride start location (in latitudes and longitudes), ride end location (in latitudes and longitudes), and the rider's membership type. The corresponding column names are ride_id, rideable_type, started_at, ended_at, start_station_name, start_station_id, end_station_name, end_station_id, start_lat, start_lng, end_lat, end_lng and member_casual.

  • Ride IDs are unique.
  • Bike types are classic bikes, electric bikes, and docked bikes.
  • Membership types are Casual Riders and Cyclistic Members.

Process

I am using Google BigQuery to combine all 12 datasets into one single dataset which will be used to clean, analyze and visualize as a part of this case study.

Merging the Datasets

The 12 CSV files are uploaded to the dataset tripdata_2022. Once these CSV files are imported as tables, I created a new table tripdata_2022 which shows all the values of monthly trip data. The combined table tripdata_2022 contains 56,67,717 rows of data for the entire year 2022.

Merging Dataset - SQL Query

undefinedData Exploration

Data Exploration - SQL Query

In this section, I'll be performing some pre-analysis checks like understanding each column individually, checking the data types of each column to understand the data, checking for null and duplicate values (if any), and many more.

  • Understanding the data types of columns undefined

  • Checking the count of null values (if any) in each column undefinedHere, the fields start_station_name and start_station_id have the same number of null values i.e., 8,33,064. Similarly, the fields end_station_name and end_station_id have the same number of null values i.e., 8,92,742. There are 5,858 null values in the end position as well. The rest of the columns do not have any null values. These records with null values can be deleted to make the data more consistent and improve its efficiency.

  • Checking the count of duplicate values (if any) in the table undefinedThere are no duplicate values in the table. ride_id has been used to check for duplicate values as it acts as the primary key for this table and does not have any null values.

  • Checking the values of ride_id to see if any cleaning is required for this column To examine this, I am checking the length of ride_id values with their count and grouping them by ride_id to see if there are any values with different lengths. If we get multiple results, cleaning has to be performed on this column, else I can proceed with my pre-analysis for other columns. undefinedAll values of ride_id have the same length which states that its data is consistent. Hence, there is no need to perform any cleaning operations on this column.

  • Checking the values of rideable_type and its data consistency undefinedThe column rideable_type gives the types of bikes available within the organisation. There are three unique bike types: electric bikes, classic bikes and docked bikes.

  • Checking the count of invalid trips Invalid trips include every trip for which the trip time is greater than a day or less than (or equal to) a minute. undefinedCount of trips greater than a day: 5,360 Count of trips less than (or equal to) a minute: 1,22,283 Hence, the total count of invalid trips is 1,27,643. These records can be deleted as they affect the data consistency and the overall performance of the data.

  • Checking the data consistency of membership types column (member_casual) undefinedThe table consists of only two membership types: Casual Riders and Cyclistic Members as mentioned by the bike share company and does not have any invalid values.

  • Three columns: ride_length_mins, weekday, and month will be created in the coming analytic process which will help in the detailed analysis and visualization of the data. Columns start_station_id and end_station_id can be dropped as it does not add any value to our analysis concerning the business problem. The columns for the starting location (start_lat / start_lng) and the ending location (end_lat / end_lng) will only be used during the data visualization stage.

Data Cleaning

Data Cleaning - SQL Query

  • Creating a new table cleaned_tripdata_2022 with new columns ride_length_mins, weekday and month. The records with null values in the columns start_station_name, end_station_name, end_lat, and end_lng will not be considered for further analysis. The records with a trip duration greater than a day and less than a minute will also be neglected for further analysis and visualization.
  • Column ride_id is set as the Primary Key of this table. undefinedundefined
  • After deleting the records with null values and having a trip duration greater than a day and less than a minute, the final table cleaned_tripdata_2022 has 42,91,805 records out of 56,67,717 records from the raw dataset (i.e., 13,75,912 records deleted). undefined

Analyze

Data Analysis - SQL Query

In the data analysis of trip data using Google BigQuery, I explored various facets of the dataset. These findings contribute to a comprehensive understanding of user behaviour, aiding in strategic decision-making for optimizing service efficiency and enhancing user experience in the ride-sharing ecosystem.

Key findings include:

Ride Duration Analysis:

  • Calculate average ride duration.
  • Identify rides with the longest and shortest durations.
  • Analyze ride duration trends over different months or weekdays.

Ride Type Analysis:

  • Count the number of rides for each rideable type (classic bike, electric bike, docked bike).
  • Analyze the popularity of different ride types over time.

Station Analysis:

  • Identify the most popular start and end stations.
  • Analyze the traffic between different stations.
  • Investigate station-wise ride durations.

Time-based Analysis:

  • Determine peak hours or days with the highest ride volume.
  • Analyze ride lengths during specific hours or days.

Membership Analysis:

  • Compare the number of rides between casual and member riders.
  • Analyze average ride durations for casual and member riders.

Temporal Trends:

  • Identify any temporal trends, such as increasing or decreasing ride counts over months.
  • Explore seasonal variations in ride frequency or duration.

Day of the Week Analysis:

  • Analyze ride patterns based on the day of the week.
  • Identify any significant differences in ride behaviour on weekends vs. weekdays.

Membership and Ride Type Interaction:

  • Investigate whether certain rideable types are more popular among casual or member riders.

Inference from the above findings to answer "How do annual members and casual riders use Cyclistic bikes differently?"

I am using the Microsoft Power BI tool for data visualization due to its user-friendly interface, rich visualization options, interactive dashboards and many more compelling features. Data Visualization helps in understanding the data, identifying patterns and trends, facilitating data exploration, and supporting effective communication, decision-making and story-telling.

1. Number of trips by membership and type of bike:

undefinedundefinedundefinedThe analysis of rider demographics reveals a clear preference for membership, with members constituting 59.67% of the total riders, while casual riders make up 40.33%. Classic bikes emerge as the top choice among riders, commanding a dominant 59.61%, followed by electric bikes at 36.35%, and docked bikes at 4.04%, primarily preferred only by casual riders.

A closer look at classic bike preferences shows that 39.21% are members, indicating a strong membership preference, and 20.41% are casual riders. Similarly, among electric bike enthusiasts, 20.47% are members, showcasing a significant membership affinity, while 15.88% are casual riders.

This nuanced analysis provides valuable insights into rider preferences, membership distribution, and the hierarchy of popularity among bike types, contributing to a comprehensive understanding of the diverse user base.

2. Number of trips by membership and month, day of the week and hour of the day:

undefinedundefinedundefinedThe analysis of monthly trip trends reveals a noteworthy disparity between members and casual riders, with members consistently surpassing casual riders in the total number of trips. Furthermore, a distinct seasonality pattern is observed, with peak activity during the summer months (June, July and August) and a notable decline in winter (December, January and February). July stands out as a month with comparable journeys for both members and casual riders.

From the analysis based on the day of the week, a clear distinction in riding patterns emerges. Members tend to ride more on weekdays, reflecting utilitarian commuting patterns, while casual riders favour weekends, suggesting a more leisurely approach.

Examining the hourly distribution of trips, a consistent trend unfolds for both members and casual riders. The peak period for total trips spans from 6 AM to 9 AM, corresponding to the morning commute, and from 4 PM to 7 PM, aligning with the evening rush hour. The data indicates a predictable cyclical pattern, with trip counts increasing throughout the day and tapering off in the evening.

In summary, the analysis of trip data across monthly, weekly, and hourly dimensions underscores the distinct usage patterns of members and casual riders, providing valuable insights into the temporal dynamics of bike-sharing activities.

3. Average Ride Duration by membership and month, day of the week and hour of the day:

undefinedundefinedundefinedUpon analysing the average duration of monthly trips, there is a noticeable trend where ride lengths for casual riders peak during the summer months (June, July, August) and decline towards the winter months. This seasonal pattern suggests that weather and temperature may significantly influence ride durations.

In the examination of weekly travel patterns, casual riders tend to have longer ride durations compared to members, with the longest rides occurring on the weekends. This could indicate that casual riders use the service for leisure activities on weekends.

Analyzing the hourly pattern of trip durations, casual riders show a peak in average ride length around midday and the evening, while members' ride lengths are relatively consistent throughout the day with a slight increase in the evening. This could imply that members use the service for commuting, while casual riders may use it for occasional trips or leisure.

These insights can help in understanding the usage patterns of the bike-sharing service and tailoring the service to better meet the needs of different user groups.

4. Geospatial Analysis

undefinedNumber of Trips by Members from the Start Location: Members start their trips predominantly in the central city area but with a slightly more even distribution extending into residential areas. This could indicate that members use the service for daily commuting as well as leisure.

Number of Trips by Members to the End Location: The distribution of trip endpoints for members is similar to their start points, with a good spread across the city. There is a noticeable concentration of endpoints in both the central business district and residential areas, reinforcing the commuting pattern.

undefinedNumber of Trips by Casual Riders from the Start Location: High activity in the central city area, with a significant number of trips starting from there. Fewer trips start from the outskirts, indicating either a lower number of casual riders or less availability of bikes.

Number of Trips by Casual Riders to the End Location: Similar to the start positions, there is a high concentration of trip endpoints in the central area. This suggests that casual riders may be using the bikes for leisure or tourism purposes, focusing on central city attractions.

Share

Based on the complete analysis that I have performed on the given dataset, I am now ready to answer the question/task assigned to me: "How do annual members and casual riders use Cyclistic bikes differently?"

undefinedBased on the analysis of the data, insights from the dashboards and the study on how annual members and casual riders use Cyclistics bikes differently, here are several suggestions and areas of improvement that could help convert casual riders into annual members:

  • Targeted Marketing Campaigns: Focus on membership benefits like cost savings, exclusive bike access, and members-only lanes/routes. Highlight positive experiences of members who transitioned from casual riding.
  • Membership Trial Offers: Offer discounted one-month trials to frequent casual riders, especially during peak months, including free rides or longer ride discounts to showcase membership value.
  • Flexible Membership Plans: Introduce weekend-only or off-peak hours memberships tailored to casual riders' usage patterns. Offer seasonal memberships for those who prefer biking during warmer months.
  • Enhanced Bike Availability: Increase docked and electric bike availability in central areas and popular destinations where casual riders frequently start and end trips. Ensure ample bikes during peak leisure times, such as weekends and holidays.
  • Exclusive Member Benefits: Provide priority bike access during high-demand periods and discounts at local businesses. Implement a loyalty program allowing members to earn points for redeemable rewards.
  • Improved App and Website Experience: Enhance the app and website with features like ride history analysis, personalized ride suggestions, and easier membership signup and renewal processes. Use the app and website to push notifications about membership benefits and promotions to casual riders.
  • Community Building and Engagement: Organize member-exclusive rides, workshops, or community events to build a sense of community among riders. Create a platform for members to share routes, tips, and experiences, fostering a sense of belonging and engagement.
  • Feedback and Personalization: Regularly gather feedback from casual riders to understand barriers to membership and areas for improvement. Use data analytics to personalize marketing messages and offers based on individual rider patterns and preferences.

Additional project images

Discussion and feedback(0 comments)
2000 characters remaining
Cookie SettingsWe use cookies to enhance your experience, analyze site traffic and deliver personalized content. Read our Privacy Policy.