Case Study 1 Case Study: How Does a Bike-Share Navigate Speedy Success?

Introduction

 Welcome to the Cyclistic bike-share analysis case study! In this case study, you will perform many real-world tasks of a junior data analyst. You will work for a fictional company, Cyclistic, and meet different characters and team members. In order to answer the key business questions, you will follow the steps of the data analysis process: ask, prepare, process, analyze, share, and act. Along the way, the Case Study Roadmap tables — including guiding questions and key tasks — will help you stay on the right path. By the end of this lesson, you will have a portfolio-ready case study. Download the packet and reference the details of this case study anytime. Then, when you begin your job hunt, your case study will be a tangible way to demonstrate your knowledge and skills to potential employers.

 Scenario

 You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.

Characters and teams

 ● Cyclistic: A bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day.

● Lily Moreno: The director of marketing and your manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels.

 ● Cyclistic marketing analytics team: A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy. You joined this team six months ago and have been busy learning about Cyclistic’s mission and business goals — as well as how you, as a junior data analyst, can help Cyclistic achieve them.

 ● Cyclistic executive team: The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.

About the company

 In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime. Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members. Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs. Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.

Ask Phase

Business Task:  The using pattern of annual members and casual riders in order to launch marketing campaign.

Tools to be used : SQL , R , Excel

Additionally I used Power BI as well to create some meaningful visualizations.

Prepare Phase

Dataset is available under this license  and can be found here .

I downloaded the data from the divvy trip data and stored in desktop.

  • I renamed the folder to make it simple.
  • I renamed all the files as per the standard naming conventions.
  • I took 12 months of data and merged into the single folder.

Process and analyze Phase

In SSMS:

I used a variety of tools in-order to process and standardize the data. First I imported all the datasets in Microsoft sql server management studio. I appended all the datasets with the join query after making sure all the columns were matched and the datatypes were synchronized.

Problems : initially the datatypes were little bit different at the time of importing the data as the ssms automatically adjusted the datatype.

After join query I deleted the unwanted columns, for this particular case I deleted the latitude , longitude , start station Id , end Station Id.

I imported this data to the power BI to check the quality of columns as Power BI has capacity to autodetect errors, blank spaces. I replaced null station names to NA as to avoid confusion.

IN R :

I did the rest of the things in R

Here is the process and Steps

installing and reading Packages

library(tidyverse)

## — Attaching packages ————————————— tidyverse 1.3.1 —

## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.2     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1

## — Conflicts —————————————— tidyverse_conflicts() —
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(lubridate)

##
## Attaching package: ‘lubridate’

## The following objects are masked from ‘package:base’:
##
##     date, intersect, setdiff, union

library(ggplot2)
library(dplyr)
library(skimr)

Reading the cleaned data set /checking the Structure

`biketripdata` <- read.csv(“C:/Users/rayam/Desktop/Blogging/swapped capstone.csv”)
head(`biketripdata`)

##            ride_id rideable_type                  started_at
## 1 169C7CEEFA777325   docked_bike 2021-04-17 04:19:01.0000000
## 2 169C90EB4F725D4F   docked_bike 2021-07-16 14:04:39.0000000
## 3 169C9918EF1365A2 electric_bike 2021-06-11 14:50:27.0000000
## 4 169CA05C802CECBC electric_bike 2021-07-09 16:39:20.0000000
## 5 169CA7E2D1C4DF8A   docked_bike 2020-08-01 11:38:55.0000000
## 6 169CAC157CA81FFE  classic_bike 2021-06-04 17:06:05.0000000
##                      ended_at            start_station_name
## 1 2021-04-17 04:24:02.0000000 Greenview Ave & Fullerton Ave
## 2 2021-07-16 14:59:07.0000000                Dusable Harbor
## 3 2021-06-11 15:02:48.0000000  Racine Ave & Washington Blvd
## 4 2021-07-09 17:04:37.0000000        East End Ave & 87th St
## 5 2020-08-01 11:39:14.0000000          Morgan Ave & 14th Pl
## 6 2021-06-04 17:27:25.0000000                    Walsh Park
##                 end_station_name member_casual
## 1 Sheffield Ave & Wrightwood Ave        casual
## 2     Lake Shore Dr & North Blvd        casual
## 3          Wood St & Chicago Ave        casual
## 4                   Calumet Park        member
## 5           Morgan Ave & 14th Pl        casual
## 6   Humboldt Blvd & Armitage Ave        casual

Checking the structure

str(biketripdata)

## ‘data.frame’:    4731081 obs. of  7 variables:
##  $ ride_id           : chr  “169C7CEEFA777325” “169C90EB4F725D4F” “169C9918EF1365A2” “169CA05C802CECBC” …
##  $ rideable_type     : chr  “docked_bike” “docked_bike” “electric_bike” “electric_bike” …
##  $ started_at        : chr  “2021-04-17 04:19:01.0000000” “2021-07-16 14:04:39.0000000” “2021-06-11 14:50:27.0000000” “2021-07-09 16:39:20.0000000” …
##  $ ended_at          : chr  “2021-04-17 04:24:02.0000000” “2021-07-16 14:59:07.0000000” “2021-06-11 15:02:48.0000000” “2021-07-09 17:04:37.0000000” …
##  $ start_station_name: chr  “Greenview Ave & Fullerton Ave” “Dusable Harbor” “Racine Ave & Washington Blvd” “East End Ave & 87th St” …
##  $ end_station_name  : chr  “Sheffield Ave & Wrightwood Ave” “Lake Shore Dr & North Blvd” “Wood St & Chicago Ave” “Calumet Park” …
##  $ member_casual     : chr  “casual” “casual” “casual” “member” …

Checking the data summary

skim(biketripdata)

Data summary

Namebiketripdata
Number of rows4731081
Number of columns7
_______________________ 
Column type frequency: 
character7
________________________ 
Group variablesNone

Variable type: character

skim_variablen_missingcomplete_rateminmaxemptyn_uniquewhitespace
ride_id011616047308720
rideable_type011113030
started_at012727040068080
ended_at012727039890880
start_station_name010533691827390
end_station_name010534073007360
member_casual0166020

Replace all the null value to N/A

biketripdata$start_station_name[biketripdata$start_station_name ==””]<- “N/A”

biketripdata$end_station_name[biketripdata$end_station_name ==””]<- “N/A”

As the date in not in data format I created new column as ridedate and assigned data type in ride_date

biketripdata$rideDate<-as.Date(biketripdata$started_at)

as we need the ride duration ; start date and end data are also assigned as Datetime data type

biketripdata$started_at<-as_datetime(biketripdata$started_at)
biketripdata$ended_at<-as_datetime(biketripdata$ended_at)

Creating the month , day , year , day of week as we need them further in analysis or the ride

biketripdata$month<-format(as.Date(biketripdata$rideDate),”%B”)
biketripdata$day <-format(as.Date(biketripdata$rideDate),”%d”)
biketripdata$year<-format(as.Date(biketripdata$rideDate),”%Y”)
biketripdata$day_of_week<-format(as.Date(biketripdata$rideDate),”%A”)

##checking the added column names again

colnames(biketripdata)

##  [1] “ride_id”            “rideable_type”      “started_at”       
##  [4] “ended_at”           “start_station_name” “end_station_name” 
##  [7] “member_casual”      “rideDate”           “month”            
## [10] “day”                “year”               “day_of_week”

checking the quality of all things

skim(biketripdata)

Data summary

Namebiketripdata
Number of rows4731081
Number of columns12
_______________________ 
Column type frequency: 
character9
Date1
POSIXct2
________________________ 
Group variablesNone

Variable type: character

skim_variablen_missingcomplete_rateminmaxemptyn_uniquewhitespace
ride_id011616047308720
rideable_type011113030
start_station_name0135307390
end_station_name0135307360
member_casual0166020
month01390120
day01220310
year0144020
day_of_week0169070

Variable type: Date

skim_variablen_missingcomplete_rateminmaxmediann_unique
rideDate012020-08-012021-07-312021-04-05365

Variable type: POSIXct

skim_variablen_missingcomplete_rateminmaxmediann_unique
started_at012020-08-01 00:00:012021-07-31 23:59:582021-04-05 13:41:294006808
ended_at012020-08-01 00:04:412021-08-12 17:45:412021-04-05 14:03:513989088

##Creating the column – length of Ride and checking the summary

biketripdata$length_of_ride=difftime(biketripdata$ended_at,biketripdata$started_at)
summary(biketripdata$length_of_ride)

##   Length    Class     Mode
##  4731081 difftime  numeric

##Changing the column into numeric

biketripdata$length_of_ride=as.numeric(biketripdata$length_of_ride)

##filtering the length of ride less than 0 seconds

biketripdata_V2<-filter(biketripdata,length_of_ride>0)

##minimum Length of ride and maximum length of ride

min(biketripdata_V2$length_of_ride)

## [1] 1

max(biketripdata_V2$length_of_ride)

## [1] 3356649

##average length of ride along with min and max

biketripdata_V2%>%summarise(min_ride_length=min(length_of_ride),max_ride_length=max(length_of_ride),average_length_ride=mean(length_of_ride))

##   min_ride_length max_ride_length average_length_ride
## 1               1         3356649            1606.376

##length of Ride by member_type

aggregate(biketripdata_V2$length_of_ride~biketripdata_V2$member_casual,FUN = mean)

##   biketripdata_V2$member_casual biketripdata_V2$length_of_ride
## 1                        casual                       2267.697
## 2                        member                       1077.608

aggregate(biketripdata_V2$length_of_ride~biketripdata_V2$member_casual,FUN = median)

##   biketripdata_V2$member_casual biketripdata_V2$length_of_ride
## 1                        casual                           1079
## 2                        member                            629

aggregate(biketripdata_V2$length_of_ride~biketripdata_V2$member_casual,FUN = max)

##   biketripdata_V2$member_casual biketripdata_V2$length_of_ride
## 1                        casual                        3356649
## 2                        member                        2005282

Before proceeding with the weekday analysis , lets sort out the things by weekday

biketripdata_V2$day_of_week<-ordered(biketripdata_V2$day_of_week,levels=c(‘Monday’,’Tuesday’,’Wednesday’,’Thursday’,’Friday’,’Saturday’,’Sunday’))

##mean length of ride by membertype and day of week

aggregate(biketripdata_V2$length_of_ride~biketripdata_V2$member_casual+biketripdata_V2$day_of_week,FUN=mean)

##    biketripdata_V2$member_casual biketripdata_V2$day_of_week
## 1                         casual                      Monday
## 2                         member                      Monday
## 3                         casual                     Tuesday
## 4                         member                     Tuesday
## 5                         casual                   Wednesday
## 6                         member                   Wednesday
## 7                         casual                    Thursday
## 8                         member                    Thursday
## 9                         casual                      Friday
## 10                        member                      Friday
## 11                        casual                    Saturday
## 12                        member                    Saturday
## 13                        casual                      Sunday
## 14                        member                      Sunday
##    biketripdata_V2$length_of_ride
## 1                       2170.0176
## 2                        855.2013
## 3                       1958.3320
## 4                        832.1721
## 5                       2527.8431
## 6                       2112.8943
## 7                       1933.2624
## 8                        831.0379
## 9                       2088.6483
## 10                       866.7661
## 11                      2374.0114
## 12                       971.0323
## 13                      2550.7374
## 14                      1003.5724

##Maximum length of ride by member type and weekday name

aggregate(biketripdata_V2$length_of_ride~biketripdata_V2$member_casual+biketripdata_V2$day_of_week,FUN=max)

##    biketripdata_V2$member_casual biketripdata_V2$day_of_week
## 1                         casual                      Monday
## 2                         member                      Monday
## 3                         casual                     Tuesday
## 4                         member                     Tuesday
## 5                         casual                   Wednesday
## 6                         member                   Wednesday
## 7                         casual                    Thursday
## 8                         member                    Thursday
## 9                         casual                      Friday
## 10                        member                      Friday
## 11                        casual                    Saturday
## 12                        member                    Saturday
## 13                        casual                      Sunday
## 14                        member                      Sunday
##    biketripdata_V2$length_of_ride
## 1                         2033524
## 2                         2005282
## 3                         2335375
## 4                          433425
## 5                         3257001
## 6                         1742998
## 7                         2946429
## 8                         1682901
## 9                         3341501
## 10                         713853
## 11                        3356649
## 12                         990269
## 13                        3235296
## 14                        1870176

sorting the data by months

biketripdata_V2$month<-ordered(biketripdata_V2$month,levels=c(‘January’, ‘February’, ‘March’, ‘April’, ‘May’, ‘June’, ‘July’, ‘August’, ‘September’, ‘October’, ‘November’, ‘December’))

mean ride length by months and Rider type

aggregate(biketripdata_V2$length_of_ride~biketripdata_V2$member_casual+biketripdata_V2$month,FUN=mean)

##    biketripdata_V2$member_casual biketripdata_V2$month
## 1                         casual               January
## 2                         member               January
## 3                         casual              February
## 4                         member              February
## 5                         casual                 March
## 6                         member                 March
## 7                         casual                 April
## 8                         member                 April
## 9                         casual                   May
## 10                        member                   May
## 11                        casual                  June
## 12                        member                  June
## 13                        casual                  July
## 14                        member                  July
## 15                        casual                August
## 16                        member                August
## 17                        casual             September
## 18                        member             September
## 19                        casual               October
## 20                        member               October
## 21                        casual              November
## 22                        member              November
## 23                        casual              December
## 24                        member              December
##    biketripdata_V2$length_of_ride
## 1                       1541.0754
## 2                        772.3612
## 3                       2962.6862
## 4                       1081.4072
## 5                       2289.6329
## 6                        838.2321
## 7                       2281.5631
## 8                        881.4281
## 9                       2294.1080
## 10                       878.4114
## 11                      2227.5393
## 12                       880.7155
## 13                      1967.5752
## 14                       854.4311
## 15                      2686.9206
## 16                      1004.9515
## 17                      2287.2709
## 18                       927.9440
## 19                      1809.8691
## 20                       839.4808
## 21                      3439.3526
## 22                      3795.2678
## 23                      1610.9252
## 24                       764.6206

##counting the total ride per month for the member types

biketripdata_V2%>%count(month,member_casual)

##        month member_casual      n
## 1    January        casual  18117
## 2    January        member  78713
## 3   February        casual  10130
## 4   February        member  39488
## 5      March        casual  84029
## 6      March        member 144457
## 7      April        casual 136590
## 8      April        member 200607
## 9        May        casual 256888
## 10       May        member 274693
## 11      June        casual 370639
## 12      June        member 358895
## 13      July        casual 442019
## 14      July        member 380322
## 15    August        casual 289608
## 16    August        member 332642
## 17 September        casual 230669
## 18 September        member 302230
## 19   October        casual 144994
## 20   October        member 243619
## 21  November        casual  88169
## 22  November        member 171897
## 23  December        casual  29998
## 24  December        member 101189

counting the ridable type for Riders

biketripdata_V2 %>%count(member_casual,rideable_type)

##   member_casual rideable_type       n
## 1        casual  classic_bike  695208
## 2        casual   docked_bike  760291
## 3        casual electric_bike  646351
## 4        member  classic_bike 1090190
## 5        member   docked_bike  797881
## 6        member electric_bike  740681

Visualization Phase

Summary of the average ride length per day of week and Rider type

biketripdata_V2%>%group_by(member_casual,day_of_week)%>%summarise(average_ride_length=mean(length_of_ride))%>%
  ggplot(aes(x=member_casual,y=average_ride_length,fill=day_of_week)) +
  geom_bar(position=”Dodge”,stat = “identity”) +
  labs(title=”distribution of average ride length by week”,subtitle=”sorted by membership”)

## `summarise()` has grouped output by ‘member_casual’. You can override using the `.groups` argument.

Total Ride duration per day of week and rider type

biketripdata_V2%>%group_by(member_casual,day_of_week)%>%summarise(total_ride_duration=mean(length_of_ride))%>%
  ggplot(aes(x=member_casual,y=total_ride_duration,fill=day_of_week)) +
  geom_bar(position=”Dodge”,stat = “identity”) +
  labs(title=”average ride length by day of  week”,subtitle=”sorted by membership”)

Trips less than 5 minutes ie. 5*60 =300 secs

biketripdata_V2%>%group_by(day_of_week,member_casual)%>%filter(length_of_ride<300  )%>%

    summarise(average_ride_length=mean(length_of_ride))%>%
  ggplot(aes(x=day_of_week,y=average_ride_length, fill=member_casual)) +
  geom_bar(position=’Dodge’,stat=’identity’) +
  labs(title=”Average ride length among different members by day of week which is less than 5 mins “,subtitle=”Membership comparision”)

Trips more than 5 mins

biketripdata_V2%>%group_by(day_of_week,member_casual)%>%filter(length_of_ride>300  )%>%

    summarise(average_ride_length=mean(length_of_ride))%>%
  ggplot(aes(x=day_of_week,y=average_ride_length, fill=member_casual)) +
  geom_bar(position=’Dodge’,stat=’identity’) +
  labs(title=”Average ride length among different members by day of week which is less than 5 mins “,subtitle=”Membership comparision”)

Total Rides per month number and rider type

biketripdata_V2%>%group_by(month,member_casual)%>%summarise(Ridenumbers=n())%>%
  ggplot(aes(x=month,y=Ridenumbers, fill=member_casual)) +
  geom_bar(position=’Dodge’,stat=’identity’) +
  labs(title=”distribution of ride numbers by month”,subtitle=”Membership comparision”)

Total length of Ride per month among rider types

biketripdata_V2%>%group_by(month,member_casual)%>%summarise(totalrideduration=sum(length_of_ride))%>%
  ggplot(aes(x=month,y=totalrideduration, fill=member_casual)) +
  geom_bar(position=’Dodge’,stat=’identity’) +
  labs(title=”distribution of total ride duration by month”,subtitle=”Membership comparision”)

Total Rides per year and membership type

biketripdata_V2%>%group_by(year,member_casual)%>%summarise(Ridenumbers=n())%>%
  ggplot(aes(x=year,y=Ridenumbers, fill=member_casual)) +
  geom_bar(position=’Dodge’,stat=’identity’) +
  labs(title=”distribution of ride numbers by year”,subtitle=”Membership comparision”)

Alternatively I created the visualization in Power BI as well to utilize my skills ,

ACT Phase

  1. Tuesdays and Sundays are more effective in case of casual riders whereas for riders with membership , Tuesday is the peak. So , Tuesday is the most effective day for marketing and we can bring some discount or promotion package.
  2. Riders with membership prefer the short rides and we can see there are more riders with membership more than casual riders whereas casual riders prefer lengthy rides. So we can encourage discounts on short rides as well.
  3. Between June and July there is a peak in numbers of rides so promotion and packages for membership is recommended at that time.
  4. Base – 2132 W Hubbard Warehouse,Hubbard St Bike checking , Indiana Ave , Lake shore Dr , Michigan ave, millennium park are the most visited locations and marketing can be strictly focused on that area.

5. The most popular bikes are Docked bike which are preferred means of bike for casual riders so company can focus more on Docked bikes.

Published by lokendrakc

Hi , I am Lokendra KC. I am a Biomedical Engineer and an aspiring BI analyst. I am very keen on Data so I took this challenge to keep the learning log throughout my data Journey.

Leave a comment