Ask: A clear statement of the business task
I’m trying to find the difference between annual members and casual riders. By finding the differences I can make a better business decision where finding the difference will lead to knowing better how to convert casual riders into annual members by doing the best marketing campaign that targets all-new customers
Prepare: A description of all data sources used
The data I will use to analyze are from April 2020 to March 2021, the data is organized. There is no issue with bias or credibility in this data as follows:
- The data are reliable.
- The data are original.
- The data are comprehensive.
- The data are from the year 2020 (Current).
- The data are cited.
I took a quick look at the data, and it looks ready to work on it and there is good data integrity.
The data will give me the perfect information to see how casual customers are different from annual members.
Though the data seems to be missing some information or some mistakes with specific dates, it won’t affect the analysis and any wrong data will be neglected from the analysis.
You can find the link to the data used in this report here: divvy-trip data
Process: Documentation of any cleaning or manipulation of data
I used R Language to clean the data and make it ready.
First of all, I imported the data and rename it with a unified name, unify the column type where the target columns are the end_station_id and start_station_id columns from April to November to be its type changed into character, then combine tables into one integrated file, then check if all columns with the right type and change the column that isn’t in its right type.
Analyze: A summary of your analysis
Start by creating new columns such as date, year, month, day, day_of_Week, by extract separately from the started_at column.
Then the length of each ride (ride_length) is created by subtracting the value of ended_at from the value of started_at. Then I check the format of the new columns, and then found and change the ride_length from wrong format (difftime) to numeric.
Data should be organized by cleaning it of any errors and making it ready for analysis. After taking a look at ride_length values, I found some values are less than zero, so I decided to remove them from the data for a better analysis.
These values happened when items took to maintenance by the headquarter company then they format the date and time therefore the ride_length will give a negative value.
Then lastly, I do a final check before extracting the organized and cleaned integrated file. After that, the cleaning process is done! Let’s extract the file, and then analyze and share the data in Tableau!
You can find the progress of both Process phase and Analyze phase in R language here: Markdown Report
Share: Supporting visualizations and key finding
I start by importing the cleaned data into my Tableau software, and then I start visualizing as follows:
- The first thing to visualize using the bar chart is how many casual riders there are compared to member riders. The ‘Count of Rides’ bar chart indicates the ride count share between casual and member riders, while the ‘Median of Ride Length (Minutes)’ bar chart indicates the ride length share between casual and member riders. From these two charts, I discover that members used more rides than casual riders, while casual riders went on longer rides compared to member riders.
- The ‘Count of Rides per month’ chart will be an area chart, the y-axis is the count of rides, while the x-axis is the month, and the data is stacked in color between casuals and members. I discovered from this chart that the maximum rides happened during the summer (Jul 20, Aug 20 and Sep 20) while the least rides happened during the winter (Dec 20, Jan 21 and Feb 21) and that’s for both types of riders.
- The ‘Count of Rides per hour’ chart is also an area chart, the y-axis is the count of rides, while the x-axis is the hours. The data is stacked in color between casuals and members. I discovered from the chart that the activity of both types of riders starts at hour 05 and starts slowly to increase over time, then it has good activity at hour 12 and rises till reaching maximum activity at hour 17, then it starts to get less over time overnight. For members specifically, there is a noticeable peak at hours 07–08, when most people are going to school or work.
- In ‘Median of Ride Length per day of week (Minutes)’ bar chart, the y-axis is the median of ride length in minutes, while the x-axis is the day of the week. The data is stacked in color between casuals and members. I discovered here that member riders have a stable ride length during workdays, while it gets a little bit longer on the weekend, which is similar to casual riders, but the peak of casual riders on Saturday and Sunday is higher. This means that weekends are the busiest days for riders.
- In ‘Median of Ride Length per months (Minutes)’ bar chart, the y-axis is the median of ride length in minutes, while the x-axis is the month. The data is stacked in color between casuals and members. I discovered from this chart that people took the longest rides during the spring months (Apr20, May 20, Jun 20 and Jul 20) which indicates that good weather has a direct correlation on riders, but then after it, the rides start to be shorter after each month till the median reaches its minimum during the winter season (Nov20, Dec20 and Jan21) and this is for both types of riders.
You can find the Tableau results of the Share phase here: Tableau Public Dashboard
Act: Top recommendations based on the analysis
Now, to end this case study, here are my recommendations for the Cyclistic marketing team:
- Add monthly membership and 6 months membership.
- Add a discount for the annually membership, and also giving a free first month for the new annual member.
- Reward existing members to share their personal stories about how they use Cyclistic and how it has become a vital part of their lives on social media using a hashtag to create a sense of community.
- The marketing campaign that targets all new customers should increase during the summer to capture the huge market share.
- Convince local influencers on social media to do ads and make a special discount code for these influencers.
- Make an impressive discount during the winter season, recommended to make it during Black Friday or the holidays. This discount will attract people to join during the low season.
Now if there is additional data, I could analyze to enhance the work I would add the following:
- Rider ID: This would help identify the pattern of each rider, especially casual riders. Understanding the pattern of casual riders means better-targeted marketing campaigns.
- The reason of leaving: In case an annual member left, the reason will help by understand what can churn customers.
- Date of Joining Annual Membership: This can help us see if our ads are making a difference and also identify in which seasons most people are joining to help target the ads.