__STYLES__
Bellabeat is a high-tech manufacturer of health products for women. Bellabeat believes analyzing smart device fitness data from 33 users will provide the company with key insights to help guide their marketing strategy.
Bellabeat company: https://bellabeat.com/
Key findings:
Step-counting was by far the most utilized feature of all the device users. Of the 33 device users within the data set, all counted their steps, 24 tracked their sleep, and only eight logged their weight. The strongest relationship between users and their smart device was from step tracking. Most users engaged with their smart devices and tracked the most steps between 9 am -7 pm daily. This is most likely due to a busy work schedule. Monday through Wednesday seems to be the days for the lowest number of steps. Although the majority of activity from all users was sedentary, there is a strong correlation between calories burned and distance traveled. People need to get up and move! Lastly, there seems to be a somewhat significant trend in activity level and hours of sleep. The more steps taken the less sleep users were getting.
Recommendations:
Based off of my analysis, I would suggest Bellabeat to engage users with step challenges throughout the work week. People could join challenges with friends and see who can take the most steps! Smart devices should also identify when step counts are high and remind users sleep is one of the most important aspects of a healthy lifestyle. Users infrequently used weight logging capabilities. The majority of people would rather not go out of their way to manually log their weigh in. Perhaps this is where Bellabeat can fill the void. Bellabeat could develop a scale to immediately log one's weigh-in via bluetooth. Scales could be promotions for winning step challenges.
Data Cleaning in R Tutorial:
# Installing Packages and library
install.packages("tidyverse")
install.packages("janitor")
library(tidyverse)
library(janitor)
install.packages("ggplot2")
library(ggplot2)
# Importing and Creating data frames
daily_activity <- read_csv("dailyActivity_merged.csv")
daily_calories <- read_csv("dailyCalories_merged.csv")
daily_intensities <- read_csv("dailyIntensities_merged.csv")
daily_steps <- read_csv("dailySteps_merged.csv")
heartrate_seconds <- read_csv("heartrate_seconds_merged.csv")
hourly_calories <- read_csv("hourlyCalories_merged.csv")
hourly_intensities <- read_csv("hourlyIntensities_merged.csv")
hourly_steps <- read_csv("hourlySteps_merged.csv")
minute_calories_n <- read_csv("minuteCaloriesNarrow_merged.csv")
minute_calories_w <- read_csv("minuteCaloriesWide_merged.csv")
minute_intensities_n <- read_csv("minuteIntensitiesNarrow_merged.csv")
minute_intensities_w <-read_csv("minuteIntensitiesWide_merged.csv")
minute_METs_n <- read_csv("minuteMETsNarrow_merged.csv")
minute_sleep <- read_csv("minuteSleep_merged.csv")
minute_step_n <- read_csv("minuteStepsNarrow_merged.csv")
minute_step_w <- read_csv("minuteStepsWide_merged.csv")
daily_sleep <- read_csv("sleepDay_merged.csv")
weight_log <-read_csv("weightLogInfo_merged.csv")
# Finding unique values
n_distinct(daily_activity$Id) #33
n_distinct(daily_calories$Id) #33
n_distinct(daily_intensities$Id)#33
n_distinct(daily_sleep$Id) #24
n_distinct(daily_steps$Id) #33
n_distinct(heartrate_seconds$Id) #14
n_distinct(hourly_calories$Id) #33
n_distinct(daily_intensities$Id) #33
n_distinct(hourly_steps$Id) #33
n_distinct(minute_calories_n$Id) #33
n_distinct(minute_calories_w$Id) #33
n_distinct(minute_intensities_n$Id) #33
n_distinct(minute_intensities_w$Id) #33
n_distinct(minute_METs_n$Id) #33
n_distinct(minute_sleep$Id) #24
n_distinct(minute_step_w$Id) #33
# Removing duplicate Values
sum(duplicated(daily_activity))
sum(duplicated(daily_calories))
sum(duplicated(daily_steps))
sum(duplicated(daily_sleep)) # has three duplicate values
sum(duplicated(weight_log))
daily_sleep <- daily_sleep[!duplicated(daily_sleep),]
# View all sheets I want to use
View(daily_activity)
View(daily_calories)
View(daily_steps)
View(daily_sleep)
View(weight_log)
View(hourly_steps)
# Separate date and time in daily sleep, weight log and hourly_steps. Edited houly_steps in cvs format.
daily_sleep <- daily_sleep %>% separate(SleepDay, c("Date", "Time"), " ")
weight_log <- weight_log %>% separate(Date, c("Date", "Time"), " ")
# Change date format into YYYY-MM-DD
daily_activity$ActivityDate <- as.Date(daily_activity$ActivityDate, format= "%m/%d/%y")
daily_steps$ActivityDay <- as.Date(daily_steps$ActivityDay, format= "%m/%d/%y")
daily_calories$ActivityDay <- as.Date(daily_calories$ActivityDay, format= "%m/%d/%y")
daily_sleep$Date <- as.Date(daily_sleep$Date, format= "%m/%d/%y")
hourly_steps$ActivityDate <- as.Date(hourly_steps$ActivityDate, format= "%m/%d/%y")
#Create a column for day of the week
daily_activity <- daily_activity
daily_activity$weekday <- weekdays(daily_activity$ActivityDate)
daily_steps <- daily_steps
daily_steps$weekday <- weekdays(daily_steps$ActivityDay)
daily_calories <- daily_calories
daily_calories$weekday <- weekdays(daily_calories$ActivityDay)
daily_sleep <- daily_sleep
daily_sleep$weekday <- weekdays(daily_sleep$Date)
weight_log <- weight_log
weight_log$weekday <- weekdays(weight_log$Date)
hourly_steps$weekday <- weekdays(hourly_steps$ActivityDate)
# Download Data Frames in CSV
write.csv(daily_activity, file = "daily_activity.csv")
write.csv(daily_calories, file = "daily_calories.csv")
write.csv(daily_steps, file = "daily_steps.csv")
write.csv(daily_sleep, file = "daily_sleep.csv")
write.csv(hourly_steps, file = "hourly_steps.csv")
write.csv(daily_sleep, file = "daily_sleep.csv")
write.csv(weight_log, file = "weight_log.csv")