__STYLES__
Tools used in this project
Google Capstone Project

About this project

Introduction

In this case study, I’ll be performing many real-world tasks that a data analyst usually does in their day-to-day job. I’ll be working with a fictional company named Cyclist and answer key business questions, in order to do that I’ll be following the six-step data analysis process: ask, prepare, process, analyze, share and act.

Scenario

I’m a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. That being the case I need to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, I will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve my recommendations, so they must be backed up with compelling data insights and professional data visualizations, that I’ll do my best to provide below.

What do we know?

Cyclistic has a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime. Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.

Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, the director of marketing Lily Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.

That’s all we know and that brings us to the ask phase.

Ask

Our goal is to design marketing strategies aimed at converting casual riders into annual members, the next three questions will be a great guide to get there:

  • How do annual members and casual riders use Cyclistic bikes differently?
  • What would make casual riders buy a membership?
  • How can Cyclistic use digital media to influence casual riders to become members?

To answer these questions, our marketing director is interested in analyzing the Cyclistic historical bike trip data to identify trends.

Prepare

In preparation for analysis I downloaded the Cyclistic trip data from the last 12 months (from January 2022 to December 2022). This is all public data that I will be using to answer the question stated above.

Note: The data has been made available by Motivate International Inc. under this license.

Process

In this step all the data was stored in order to make it ready to analyze.

Below you can see the cleaning process.

  • Created a folder on my desktop to house the files using, of course, appropriated file-naming conventions (example: “2022_01_tripdata”).
  • Created sub folders for the .CSV files and the .XLS files so that I have a copy of the original data.
  • Checked all the spreadsheets “member_casual” columns to make sure they only contained the two accepted values (member or casual).
  • Created a column called “day_of_week” and calculated the day of the week that each ride started, using a combination of the WEEKDAY and IF functions (so the output is in text format), for each month. =(WEEKDAY(C2; 1) =IF(C2=1,"SUNDAY",IF(C2=2,"MONDAY",IF(C2=3,"TUESDAY"IF(C2=4,"WEDNESDAY"IF(C2=5,"THURSDAY"IF(C2=6,"FRIDAY"IF(C2=7,"SATURDAY"))))))
  • Checked if all “ride_length” columns had values and if none of them were negative or blanks.
  • Checked if all “day_of_week” columns only had correct values and no blanks.

After that I went to the analyze phase.

Analyze

R

In order to analyze the data, I opted for using R. It’s the perfect tool to handle the huge collection of data the company has. Below you can find a brief summary of the steps I took to analyse the data. The full process (calculations, filtering, etc.) can be found on my (Github).

Loaded the libraries needed.

Imported the original months data into individual data frames.

Merged all the months into a full year data frame called cyclistic_2022.

Created a copy of the cyclistic_2022 data frame called cyclistic_data where all my calculations would take place.

Created and calculated the ride_length column by subtracting end_time from start_time.

Created new columns called date, year, hour, time, day, month, quarter and time_of_day.

Changed the name of the column member_casual to membership to make it more intuitive.

Cleaned the data by removing duplicates and the unnecessary columns: ride_id; rideable_type; start_station_id; start_station_name; end_station_name; end_station_id; start_lat; start_lng; end_lat; end_lng.

Calculated the number of rides made by all riders (number of rows), by member type, time of the day, hour, day of the week, day of the month, month and quarter.

##Total number of rides
> nrow(cyclistic_data)
[1] 5667717

##Number of rides by mmeber type
> cyclistic_data %>%
+   group_by(membership) %>% 
+   count(membership)
  membership       n
1 casual     2322032
2 member     3345685

##Number of rides by time of day
> cyclistic_data %>%
+   group_by(time_of_day) %>% 
+   count(time_of_day)
  time_of_day       n
1 Afternoon   2470323
2 Evening     1600259
3 Morning     1350303
4 Night        246832

(...)
##Overall avg ride length
> cyclistic_avgRide <- mean(cyclistic_data$ride_length)
> print(as_hms(cyclistic_avgRide)) #to get the result in mm:ss format
00:19:26.281596

##by member type
> avgMember <- cyclistic_data %>% group_by(membership) %>% 
+   summarise_at(vars(ride_length),
+                list(time = mean))
> avgMember$time <- as_hms(avgMember$time) #formats time as mm:ss
> print(avgMember)
  membership time         
  <chr>      <time>       
1 casual     29:07.771699
2 member     12:42.705460

#by quarter
> cyclistic_data %>% 
+   group_by(quarter) %>% 
+   summarise_at(vars(ride_length),
+                list(time = mean))
  quarter time         
1 1Q      16.74187 mins
2 2Q      21.06115 mins
3 3Q      20.51899 mins
4 4Q      15.70774 mins

(...)

Share

Tableau |

In order to visualize the findings, I opted for using tableau.

For this project a tableau dashboard is perfect for visualizing the data. Below is a brief summary of the process :

  • Created a separate R code with some minor changes to the cyclistic_data data frame to use in Tableau.
  • Created a copy of the cyclistic_2022 data frame called cyclistic_tableau where all my calculations would take place.
  • Changed the content of the column month so it shows the month name (January) instead of its correspondent number (1).
  • Cleaned the data by removing duplicates and all the unnecessary columns like in the calculations coding but also the started_at and ended_at columns, as they are not necessary for the visualization part and would only slow the process in Tableau.
  • Exported the final data frame into a .csv file to use in Tableau.

Tableau graphs were created for:

Total Rides by Membership

undefined

Total Rides by Weekday

undefined

Average Ride Length by Weekday

undefined Total Rides by Hour

undefinedTotal Rides by Month

undefined

Act

Below you will find a summary of my key findings. Based on the findings I will answer the questions that was initially made in the Ask phase.

Key Findings

  • Members had the bigger share of rides, amounting to a total of 59% of all rides.
  • The average ride length for annual members (12m42s) was less than half of the average ride length of casual riders (29m07s)
  • Casual riders tend to use Cyclistic much more on the weekends, we can see a huge difference when comparing to weekdays. On members that trend is not observed as they use Cyclistic even more on weekdays
  • However when talking about average ride length both members and casual riders tend to take longer rides on weekends.
  • Both members and casual riders use Cyclistic more in the Afternoon, that time of the day amounts to 43.59% of all rides. The busiest hour turned out to be 16:00/4 PM for both members and casual riders, with 10% of all rides.
  • The busiest month for casual riders was July, as for members the busiest was actually August . The 3rd Quarter was the busiest, counting for 40.77% of all rides which was expected being that it includes most of the summer season.

Suggestions

  • How do annual members and casual riders use Cyclistic bikes differently? Based on what was found casual riders tend to use Cyclistic to make longer rides, the average ride length of casual riders (29 min) more than doubles the average ride length of annual members (12 min), which screams in the eyes of casual riders using Cyclistic (buying a daily pass or single ride) “only” makes sense if they are going to take longer rides. This is supported by the fact that casual riders have a smaller percentage of the total rides (~41%), concluding that annual members use Cyclistic with more freedom since they don’t have to be concerned with maximizing their rides, they can always take a short ride, stop and take another with no downside.
  • What would make casual riders buy a membership? Adding more stations to cover more area is always a good start, It was advised to offer a discount to casual riders who are buying the first annual membership this way they can attract more annual members and approximately after the first year the riders would see the true potential of having an annual membership and would keep being members for more years. Another good option would be to offer free months to new members instead of offering an overall discount to the first year.
  • How can Cyclistic use digital media to influence casual riders to become members? Investment in ads is needed . Advertising in platforms like Spotify, making an ad promoting the discounts suggested above in a platform like Spotify would gather a lot of annual membership . It is also suggested investing in ads on podcasts, youtube content creators and twitter pages of which most of the fanbase is composed of bikers, youtube channels about biking and exercise would be a good example. A Cyclistic app with a personal profile that takes in consideration how each rider uses the company services and makes recommendations on how to maximize the potential of Cyclistic is created. It should include the promotions available and the benefits of becoming an annual member.

Additional project images

Discussion and feedback(0 comments)
2000 characters remaining
Cookie SettingsWe use cookies to enhance your experience, analyze site traffic and deliver personalized content. Read our Privacy Policy.