__STYLES__
Business need: To analyze a dataset containing New York City traffic data collected from Jan. 1, 2022 to Jan. 30, 2023 (which is a subset of the data from here) and to find trends and insights that can aid future traffic accident prevention.
My goal: To create an overview of trends in the factors affecting the number of collisions, injuries and fatalities in the 5 boroughs of NYC in the year 2022 to help the administrators at NYPD come up with a plan to prevent traffic accidents for the entire city.
Insights:
Recommendations:
Collisions, injuries and fatalities can be prevented in the future by allocating more funding and resources to:
And most importantly, by collecting data with no gaps, to aid future analysis and collision prevention.
Data cleaning: The dataset, an xlsx file, contained data about all the traffic accidents that happened in NYC between Jan. 1, 2022 and Jan. 30, 2023, with each of the 110334 rows representing a unique collision, with a unique collision ID. Since I was only interested in finding trends and insights for the year 2022, I omitted the 6686 records from 2023 from my analysis. Among the remaining records, there was data missing/ not collected for several variables (columns). One of these was the 'Borough' column. The name of the Borough in which a collision happened was missing from several records. Since a couple of my metrics were intended to present the data in a Borough specific manner, I omitted the 18,017 records (17.4% of data from 2022) from which the name of the Borough was missing. Several records also had the 'Contributing factor' listed as 'Unspecified'. Although I have not omitted these records from my analysis, these records have not been visualized.