__STYLES__
The data is about the number of births and deaths in two clinics such as "clinic 1" and "clinic 2".
import pandas as pd
yearly = pd.read_csv("datasets/yearly_deaths_by_clinic.csv")
print(yearly.head())
yearly['proportion_deaths'] = yearly['deaths']/yearly['births']
clinic_1 = yearly[yearly['clinic'] == 'clinic 1']
clinic_2 = yearly[yearly['clinic'] == 'clinic 2']
print(clinic_1.head())
import matplotlib.pyplot as plt
%matplotlib inline
ax = clinic_1.plot(x="year", y="proportion_deaths",
label="clinic_1")
clinic_2.plot(x="year", y="proportion_deaths",
label="clinic_2", ax=ax, ylabel="proportion_deaths")
Output:
Image attached
Interpretation: The proportion of deaths is consistently so much higher in Clinic 1. The only difference between the clinics was that many medical students served at Clinic 1, while mostly midwife students served at Clinic 2. While the midwives only tended to the women giving birth, the medical students also spent time in the autopsy rooms examining corpses.
So, handwashing was made mandatory in clinic 1. The monthly data from Clinic 1 is analyzed further to see if handwashing had any effect.
monthly = pd.read_csv('datasets/monthly_deaths.csv', parse_dates=['date'])
monthly["proportion_deaths"] = monthly["deaths"] / monthly["births"]
5. Highlighting decline in the proportion of deaths:
handwashing_start = pd.to_datetime('1847-06-01')
before_washing = monthly[monthly['date'] < handwashing_start]
after_washing = monthly[monthly['date'] >= handwashing_start]
ax = before_washing.plot(x = 'date', y = 'proportion_deaths', label = 'before_washing')
after_washing.plot(x = 'date', y = 'proportion_deaths', ax = ax, ylabel = 'Proportion deaths')
Output:
Image attached.
Interpretation:
6. Difference in the mean monthly proportion of deaths:
import numpy as np
before_proportion = before_washing['proportion_deaths']
after_proportion = after_washing['proportion_deaths']
mean_diff = np.mean(after_proportion) - np.mean(before_proportion)
mean_diff
Output:
-0.08395660751183336
Interpretation:
It reduced the proportion of deaths by around 8 percentage points.
import numpy as np
boot_mean_diff = []
for i in range(3000):
boot_before = before_proportion.sample(frac=1, replace=True)
boot_after = after_proportion.sample(frac=1, replace=True)
boot_mean_diff.append(boot_after.mean() - boot_before.mean())
confidence_interval = pd.Series(boot_mean_diff).quantile([0.025, 0.975])
confidence_interval
Output:
0.025 -0.101535
0.975 -0.067587
dtype: float64
Final Conclusion:
So it can be inferred that handwashing reduced the proportion of deaths by between 6.7 and 10 percentage points, according to a 95% confidence interval.