__STYLES__

Bellebeat FitBit Wellnes Tracker Analysis

Tools used in this project
Bellebeat FitBit Wellnes Tracker Analysis

About this project

Introduction:-

Welcome to the Bellabeat data analysis case study! In this case study,I am performing many real-world tasks as a junior data analyst. I am assuming that I am working for Bellabeat, a high-tech manufacturer of health-focused products for women, and meet different characters and team members. In order to answer the key business questions.

I will follow the steps of the data analysis process: ask, prepare, process, analyze, share, and act.

Ask:-

These questions will guide our analysis:

  1. What are some trends in smart device usage?
  2. How could these trends apply to Bellabeat customers?
  3. How could these trends help influence Bellabeat marketing strategy?

Business Task:-

Bellabeat is a small company that creates health-focused smart devices for women. Bellabeat’s executive and marketing teams want insight into how consumers use their smart devices in order to reveal opportunities for growth and to guide the company’s marketing strategy.

Bellabeat products and analyze smart devices usage data in order to gain insight into how people are already using their smart devices. And that insights can be applied to Bellabeat’s products.

Prepare:-

The datasets downloaded from here: https://www.kaggle.com/arashnic/fitbit This Kaggle data set contains personal fitness tracker from thirty fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits. I am using pyhton to analyze this data.

Load required libraries

#importing libraries
import numpy as np  # linear algebra
import pandas as pd # data processing,data structure and data analysis
import matplotlib.pyplot as plt # data visualization
import seaborn as sns  # data visualization
import datetime as dt  # date time

Reading the files

#reading the files in the csv form
data = pd.read_csv('/kaggle/input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv')

#preview first 5 rows
data.head(5) 

undefinedProcess:-

  • check the null values and decide how to handle them.
  • check the data type and convert the data/ columns into the right type.
  • check the dataframe dimension.
# check null values in data 
null_values = data.isnull().sum()
#show null values
null_values[:]
# check null values in data 
null_values = data.isnull().sum()
#show null values
null_values[:]
Id                          0
ActivityDate                0
TotalSteps                  0
TotalDistance               0
TrackerDistance             0
LoggedActivitiesDistance    0
VeryActiveDistance          0
ModeratelyActiveDistance    0
LightActiveDistance         0
SedentaryActiveDistance     0
VeryActiveMinutes           0
FairlyActiveMinutes         0
LightlyActiveMinutes        0
SedentaryMinutes            0
Calories                    0
dtype: int64

#get info of tha dataframe using info()
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 940 entries, 0 to 939
Data columns (total 15 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Id                        940 non-null    int64  
 1   ActivityDate              940 non-null    object 
 2   TotalSteps                940 non-null    int64  
 3   TotalDistance             940 non-null    float64
 4   TrackerDistance           940 non-null    float64
 5   LoggedActivitiesDistance  940 non-null    float64
 6   VeryActiveDistance        940 non-null    float64
 7   ModeratelyActiveDistance  940 non-null    float64
 8   LightActiveDistance       940 non-null    float64
 9   SedentaryActiveDistance   940 non-null    float64
 10  VeryActiveMinutes         940 non-null    int64  
 11  FairlyActiveMinutes       940 non-null    int64  
 12  LightlyActiveMinutes      940 non-null    int64  
 13  SedentaryMinutes          940 non-null    int64  
 14  Calories                  940 non-null    int64  
dtypes: float64(7), int64(7), object(1)
memory usage: 110.3+ KB

#check the dimension of tha data
data.shape

(940, 15)

  • The data have 940 rows and 15 columns
#count distinct values of Id column
print(len(pd.unique(data["Id"])))

33

  • The data have 33 unique id
  • Convert the ActivityDate into the right format and check the columns
#convert "ActivityDate" to datetime64 dtype
data["ActivityDate"]= pd.to_datetime(data["ActivityDate"],format="%m/%d/%Y")

#print first 5 row to confirm type of columns
data["ActivityDate"].head(5)
0   2016-04-12
1   2016-04-13
2   2016-04-14
3   2016-04-15
4   2016-04-16
Name: ActivityDate, dtype: datetime64[ns]
#create new columns "day_of_week"
data['day_of_week']=data['ActivityDate'].dt.day_name()

#create new column "month"
data['month']=data['ActivityDate'].dt.month_name()

#create new column sum of total minutes
data["total_mins"]=data["VeryActiveMinutes"]+data["FairlyActiveMinutes"]+data["LightlyActiveMinutes"]+data["SedentaryMinutes"]

#create new columns sum of total hours
data["total_hours"]=round(data["total_mins"]/60,1)
#rename columns to insert separator between words and lower the colomn's names
data.rename(columns={"Id":"id","ActivityDate":"date","TotalSteps":"total_steps"
                     ,"TotalDistance":"total_dist","TrackerDistance":"track_dist"
                     ,"LoggedActivitiesDistance":"logged_dist","VeryActiveDistance":"very_active_dist"
                     ,"ModeratelyActiveDistance":"moderate_active_dist","LightActiveDistance":"light_active_dist"
                     ,"SedentaryActiveDistance":"sedentary_active_dist","VeryActiveMinutes":"very_active_mins"
                     ,"FairlyActiveMinutes":"fairly_active_mins","LightlyActiveMinutes":"lightly_active_mins"
                     ,"SedentaryMinutes":"sedentary_mins","TotalexerciseMinutes":"total_mins"
                     ,"TotalExerciseHours":"total_hours","Calories":"calories"}, inplace=True)
#create new list of rearrange columns
new_cols=['id','date','month','day_of_week','total_steps','total_dist','total_mins'
          ,'total_hours','calories','track_dist','logged_dist','very_active_dist'
          ,'moderate_active_dist','light_active_dist','sedentary_active_dist','very_active_mins'
          ,'fairly_active_mins','lightly_active_mins','sedentary_mins']
#reindexing function to rearrange columns based on new columns
data=data.reindex(columns=new_cols)

Analyze:-

  • Now the data ready to analyze.
  • Show data to ckeck that it will appear properly for analysis.
  • Check the structure of data and show the data statistics for further analysis.
#print first 5 row using head function to check all changes
data.head(5)
#get the new shape of dataframe
data.shape

(940, 19)

  • Now, the data have 940 rows and 19 columns
#get statistics of the data
data.describe()

undefined

Share:-

  • Now we can check the activity of users on the bases of different parameter such as calorie burned in per hour, calorie burned in steps, app uses per day and per month.
#size, style, grid
sns.set_style("whitegrid")
plt.figure(figsize=(8,4))

#set the plot
plt.hist(data.day_of_week, bins=7, width = 0.5, color="orange")

#set labels,title
plt.xlabel("Day of the week", color= 'black', size=14)
plt.xticks(rotation=45, size=14)
plt.ylabel("Frequency", color='black', size=14)
plt.title("App uses per day of week",size=20)

#show the plot
plt.show()



undefined

#size, style, grid
sns.set_style("whitegrid")
plt.figure(figsize=(4,4))

#set the plot
plt.hist(data.month, bins=3, width = 0.5, color="green")

#set labels,title
plt.xlabel("Month", color= 'black', size=14)
plt.xticks(rotation=30, size=14)
plt.ylabel("Frequency", color='black', size=14)
plt.title("App uses per month",size=20)

#show the plot
plt.show()

undefined

#size, style, grid
sns.set_style("whitegrid")
plt.figure(figsize=(8,4))

#set the plot
sns.scatterplot(data=data, x="total_hours", y="calories", hue="calories", palette= "viridis")

#set labels,titles
plt.xlabel("Number of Hours",size=15)
plt.ylabel("Calories",size=15)
plt.title("Calories Burned Per Hour",size=20)
plt.legend()

#show the plot
plt.show()

undefined

#size, style, grid
sns.set_style("whitegrid")
plt.figure(figsize=(8,4))

#set the plot
sns.scatterplot(data=data, x="total_steps", y="calories", hue= "calories" ,palette= "viridis")

#set the labels and title
plt.xlabel("Number of Steps",size=15)
plt.ylabel("calories",size=15)
plt.title("Calories Burned in Steps",size=20)
plt.legend()

<matplotlib.legend.Legend at 0x7fb543f1ce90>

undefined

#size, style, grid
sns.set_style("whitegrid")
plt.figure(figsize=(8,4))

#set the plot
sns.scatterplot(data=data, x="total_dist", y="calories", hue= "calories" ,palette= "viridis")

#set the labels and title
plt.xlabel("Total Distance",size=15)
plt.ylabel("calories",size=15)
plt.title("Calories Burned with Distance",size=20)
plt.legend()

<matplotlib.legend.Legend at 0x7fb543ec4750>

undefined

#create sum of each usage in minutes and covert into hours
very_active_mins=data["very_active_mins"].sum()/60
fairly_active_mins=data["fairly_active_mins"].sum()/60
lightly_active_mins=data["lightly_active_mins"].sum()/60
sedentary_mins=data["sedentary_mins"].sum()/60

#pie chart to show the percent size of each usage minutes
slices=[very_active_mins,fairly_active_mins,lightly_active_mins,sedentary_mins]
labels=["Very Active","Fairly Active","Lightly Active","Sedendtary"]
colours=["grey", "orange", "pink", "green"]
explode=[0.1,0.1,0.1,0.1]

#size,style and title
plt.style.use("default")
plt.title("% of activity in Hours",size=20)

#set the plot
plt.pie(slices,labels=labels,colors=colours,explode=explode,autopct="%1.1f%%")

#show the plot
plt.show

<function matplotlib.pyplot.show(close=None, block=None)>

undefined

#create sum of each usage in minutes and covert into hours
very_active_dist=data["very_active_dist"].sum()
moderate_active_dist=data["moderate_active_dist"].sum()
light_active_dist=data["light_active_dist"].sum()

#pie chart to show the percent size of each usage minutes
slices=[very_active_dist,moderate_active_dist,light_active_dist]
labels=["Very Active","Moderate Active","Light Active"]
colours=["grey", "orange", "green"]
explode=[0.1,0.1,0.1]

#size,style and title
plt.style.use("default")
plt.title("% of activity of Distance",size=20)

#set the plot
plt.pie(slices,labels=labels,colors=colours,explode=explode,autopct="%1.1f%%")

([<matplotlib.patches.Wedge at 0x7fb542504c90>, <matplotlib.patches.Wedge at 0x7fb542511610>, <matplotlib.patches.Wedge at 0x7fb542511e50>], [Text(0.7715513758527146, 0.9190802328522703, 'Very Active'), Text(-0.5790909313131726, 1.051025067860345, 'Moderate Active'), Text(-0.4326482208233378, -1.1192924180116652, 'Light Active')], [Text(0.45007163591408345, 0.536130135830491, '27.8%'), Text(-0.33780304326601734, 0.6130979562518678, '10.5%'), Text(-0.2523781288136137, -0.6529205771734714, '61.7%')])

undefined

Act:-

  • Looking to the above anlysis and chart: The histogram plot "App uses per day of week" shows that the app more used in weekdays(tuesday to thursday) and less used in weekends(friday to monday). And the app used in the month of april more than may.
  • The scatter plot shows the relation between number of hours uses to burned the calories. As above analysis even uses the number of hour more(15-20 hours) in a day the calories burning not more increase, this values is due to the usase of the app in sedentary mode mostly.
  • The next scatter plot shows the relation between the counts of steps to the calories burn. As plot shows that increasing the number of steps, the burning of calories increased. As we can see, the most users steps are on average 7637 steps and the average of 75% of the user is 10727. And the average of 25% is 3700 around. These data shows the good correlation between counts of steps to the calories burned.
  • The scatter plot "calories burned with distance" shows the relation about the distance and calories burned, as we see the plot is similar to the "calories burned with steps". As we seen icreased the ditance is more burned the calories of user's even average of 25% users is 2.62 km but the calorie burning is higher.
  • The pie chart shows the % of each mode,the sedentary mode have higher percentage(81) from the total users. The very active mode and fairly active mode is 1.7%,1.1%. That is the big differenec between the sedentary mode and active mode.
  • The pie chart "% of activity of Distance" show the light active users distance 61.7% and the active users distance 27.8%.

Recommendation:-

  1. The mostly users use the app on weekdays and in the weekend they are not use the app. So incraese the uses of app even they does not necessary increase the calories burned.
  2. Offer some fitness tracker devices services so they can use them and analyze thier workout and increase the more features in the devices such as sleeping pattern, heart rate water uses ehich is usefull in daily life.
  3. These analysis and trends can help to Bellabeat marketing team to promote and to do campaigns. Marketing team can promote the uses of app by telling more benefits about the app. Marketing team can do awareness campaign on weekend and tells about that how can be fit in future. Give some rewards on the activity so the can use app frequently.

To Access Complete Notebook:- Fitbit Wellness Tracker

Additional project images

Discussion and feedback(0 comments)
2000 characters remaining