Introduction: Repair Our Air (ROA) is an environmental think tank dedicated to formulating policy recommendations to enhance air quality in the United States. The organization aims to utilize the Environmental Protection Agency's (EPA) Air Quality Index (AQI) as a guiding metric to prioritize their strategies for improving air quality. AQI provides valuable insights into the concentration of air pollutants and their potential impact on public health and the environment. ROA has identified three critical decisions that require data-driven analysis using AQI data.
Project Objectives: In this data analysis project, we aim to leverage AQI data to assist ROA in making informed policy decisions for improving air quality in America. Specifically, we will address the following hypotheses and provide recommendations based on statistical significance:
- Hypothesis 1 - Metropolitan-Focused Approach in California: ROA wants to explore if the mean AQI in Los Angeles County is statistically different from the rest of California. This will help determine if a metropolitan-focused approach is appropriate within the state.
- Hypothesis 2 - Regional Office Location Decision: With limited resources, ROA needs to choose between New York and Ohio for their next regional office. We will test whether New York has a lower AQI than Ohio, aiding in the decision-making process.
- Hypothesis 3 - New Policy Impact on Michigan: ROA is introducing a new policy that will affect states with a mean AQI of 10 or greater. We will assess if Michigan can be ruled out from being affected by this new policy.
Data Source: We have access to a dataset containing national AQI measurements by state over time, covering AQI readings collected on January 1st, 2018. The dataset is assumed to be a random sample from a larger population, providing us with representative AQI data for analysis.
Methodology: The analysis will involve conducting hypothesis tests for each of the three scenarios to make data-driven recommendations. We will follow these steps for each test:
- Data Exploration: We will explore the AQI dataset to understand its structure and characteristics. Descriptive statistics will help us understand the time range, distribution, and representation of states in the dataset.
- Hypothesis Formulation: For each test, we will formulate the null and alternative hypotheses based on the research question and the direction of interest.
- Significance Level: We will set a significance level of 5% for each hypothesis test.
- Appropriate Test Procedure: Depending on the nature of the comparison (e.g., means between two independent samples or one sample mean relative to a particular value), we will select the appropriate test procedure (e.g., two-sample t-test or one-sample t-test).
- Compute P-Values: We will calculate the p-value for each test, which will determine the statistical significance of the results.
Key Insights and Recommendations: Based on the results of each hypothesis test, we will draw meaningful conclusions for ROA's policy decision-making. The key insights and recommendations will be as follows:
- Metropolitan-Focused Approach: If the mean AQI in Los Angeles County is statistically different from the rest of California, we will recommend that ROA adopts a metropolitan-focused approach within the state, considering the unique air quality challenges in Los Angeles.
- Regional Office Location Decision: If the hypothesis test indicates that New York has a lower AQI than Ohio, we will recommend choosing New York for the next regional office location due to its comparatively better air quality.
- New Policy Impact on Michigan: If the hypothesis test fails to show that Michigan's mean AQI is greater than 10, we will recommend excluding Michigan from the states affected by the new policy, focusing resources on other states with higher AQI values.
Conclusion: This data analysis project demonstrates the power of hypothesis testing in making evidence-based policy decisions to improve air quality. By leveraging AQI data, ROA can strategically prioritize its initiatives and allocate resources effectively. The statistical significance of the results will provide confidence in the decision-making process, ultimately contributing to a cleaner and healthier environment for the nation.