__STYLES__

Pedaling into Insights: A Cyclistic Bike-share Analysis Using R

Tools used in this project
Pedaling into Insights: A Cyclistic Bike-share Analysis Using R

About this project

Introduction

Cyclistic is a bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day.

Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members. Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, The Marketing Team believes that maximizing the number of annual members will be key to future growth.

Statement of Business Task

To identify opportunities for targeted marketing campaigns to convert casual riders into annual members. This will be done through analysis of bike trip data and understanding user behaviour and preferences. The ultimate goal is to increase profitability and drive future growth for the company.

Description of Data Source and Tools used

The data has been organized in monthly and quarterly periods. Since this project was started in May of 2023, the data from April 2022 to May 2023 has been selected. The data has been made available by Motivate International Inc. This is public data that you can use to explore how different customer types are using Cyclistic bikes. The data contains the following columns:For this project, R Studio was the preferred tool because it has a wide range of tools available and can handle massive amounts of data with ease. undefined

Data preparation

The data preparation phase is where we attempt to understand the data. It might require cleaning, transformation, and integration. Data Analysis is not magic and it only works when a problem which needs to be solved, is represented as accurately as it can be from the real world. One of the most important and vital tasks of Data Analysis is cleaning and preparing the data. The datasets was loaded into the IDE using the “readr” library. Since there was 12 datasets for each month of the year, I had to bind all 12 of them into one mega dataset to make analysis and manipulation easier. The following code snippet shows an overview of the dataset.The mega dataset consists of 13 attributes, these attributes ranges from the ride ID and station name to a label identifying if a ride was made by a Cyclistic member or a casual rider. The dataset contained null values especially in the start station and end station columns. With the help of the “dplyr” & “tidyverse” libraries, I removed these null values, proceeded to drop duplicates rows and rows with incorrect ride timeline. An incorrect ride timeline is one where the entered start time is lower than the entered end time in the dataset.undefined

To further ensure that my dataset is of the highest quality before I carry out analysis, I treated the data-time columns present in the datasets using “lubridate” library. I ensured that these date-time columns are in the proper format. To help during analysis, I created a new column named “ride_length”, this column is the difference between the starting time of a ride and the time the ride was ended. I encoded this column in different date-time format for future analysis. I also dropped rows in my dataset with ride length of less than two minutes and more than 1500 minutes as these rows are considered outliers. After necessary manipulation and cleaning of my dataset, the dataset contains of 4,370,222 rows and 14 columns. A snapshot of the dataset is shown below.

undefinedExploratory Data Analysis

Exploratory data analysis (EDA) is used by data scientists to analyze and investigate data sets and summarize their main characteristics, often employing data visualization methods. It helps determine how best to manipulate data sources to get the answers you need, making it easier for data scientists to discover patterns, spot anomalies, test a hypothesis, or check assumptions.


Ride Length Analysis

I carried out basic statistical analysis on the ride length column:

undefined· The average ride last for around 17 minutes

· The median ride length is around 10.7 minutes

· The longest a ride have lasted is about 1499 minutes which is roughly 25 hours and the shortest is 2 minutes.

Comparison between members and casual riders

Using statistical methods as well as elaborate visualizations, I carried out comparison between the two customer types.

undefined· Overall, within the time frame being considered member customers have undertaken more rides than casual riders but still,

· The total duration of rides by casual customers is higher than that of member customers.

· On average, rides of casual riders is also last longer and the median of ride length for casual riders is higher.

undefined· Member riders take more rides during the work week (Monday - Friday) than casual riders - Casual riders take more rides during the weekend.

undefined· There is a clear reduction in number of rides for both casual and member customers between November 2022 & Mar 2023

· Casual riders have their peak ride months in June, July & August

· Member riders also have their peak ride months in June, July & August

undefined· Only casual riders used docked bikes within the timeline being considered

· There is no clear preference for either classic bikes or electric bikes among casual riders but classic bikes seems to have the upper hand.

· 64% of member riders used classic bikes within the timeline.

undefined·With few exceptions, the top 10 start and end stations for casual riders are nearly identical. In particular, the top 5 stations (listed above) are the same start and ending points for casual riders. This could be useful for making recommendations for a targeted ad campaign to convert casual riders to member riders.

Overall, those are insights that can be gleaned from the above analysis and help guide the actions of the marketing team to achieve the goal of converting casual riders to member riders.

Recommendations

In particular, my recommendations for the marketing campaign are as follows:

  1. Casual riders prefer to take longer trips averaging 24 minutes per trip compared to members who average only 12 minutes.
  • Use this statistic to show casual riders how they could save more money in the long run by becoming a member instead of paying for rides based on trip duration.

  • Introduce a member only rewards program based on trip duration to incentivize casual riders to sign up as members and be eligible for the rewards.

  1. Casual riders prefer to use Cyclistic bikes on the weekends where the number of users are almost twice as much as users in the middle of the week.
  • Develop a weekend membership plan whereby rides on the weekends are included in the base price while members have the option to book weekday rides at a lower rate.
  1. Cyclistic could partner with restaurants and recreational facilities near the Top 5 Start/End locations.
  • As an example a discounted bike share membership could be offered when they purchase an item at a local business near these locations.
  1. Casual riders have their peak ride times during the months of June, July and August. To encourage them to continue using the bike share service;
  • Special promotions for membership could be offer outside of peak times in effort to encourage them to continue using the service.

  • This could also be applied during the work-week (Monday - Friday) when casual riders are less-likely to use the service.

Here is the link to the Full Project Code on GitHub.

Discussion and feedback(0 comments)
2000 characters remaining