__STYLES__
Introduction:
Electricity outages are a growing concern as we enter an age of unprecedented energy demand and climate disasters. I will be playing the role of a Senior Analytics Consultant hired by the U.S. Department of Energy (DOE) and have been tasked with creating a dashboard or report to help understand the patterns and trends around the outages, quantify their impacts on the communities, and identify the possible weak points in the grid.
About the Dataset:
Information on electric disturbance events is collected using Form DOE-417 and published online in an annual summary. The dataset contains 4 files for download: An Excel spreadsheet containing the annual summaries, and 3 PDF documents for reference (the survey form, instructions, and documentation for online form submissions). The data is an event-level power outage data spanning from 2002 until 2023. The data has severe issues with quality and hence a rigorous cleanup was required.
Tools Used:
Power Query (Data Cleaning)
Tableau (Data Visualization)
Phases involved:
The overall process is divided into 2 phases - Data cleaning, and Data Analysis & Visualization.
Phase 1: Data Cleaning and Preparation:
The Cleaned Dataset:
The data with the null date events, and empty rows were removed. I assumed and mapped the data for the null NERC Regions by Areas Affected column and hence they are taken into consideration. The Counties and Cities are considered at a State Level. All the fields in the sheet for 2002-2022 are prepped to be consistent with the 2023 sheet. The NERC regions, Alert Criteria (the data for this was not provided for the early years), Events are grouped into broader categories and the Areas affected column is cleaned and updated with the respective State Names. The date fields especially in the restoration datetime field was extremely messy and that was thoroughly cleaned, and a standard time format is achieved.
Assumptions made (on a higher level, more granular details, in the later section):
NERC Region: The NERC Regions are segregated, and the typo or half written Regions are identified (based on the Areas affected and checking them online) and they are replaced accordingly. If a value has got 2 or more regions, they are split accordingly. Note: Few of the NERC Region mentioned were either dissolved or merged but I have kept them not to lose the essence of information, if I find anything interesting with knowing about the previous existent region. I do not want to assume them with any other or completely ignore them as Unknown / Other, as the other details along with them may be affected. Hence, I chose to keep them.
Alert Criteria: There were many incomplete Criteria listed, they are segregated, and I got a clean list of alert criteria. Then I ensured they map with the details in the PDF provided. Also, I have categorized them into broad categories like (Alert Reporting System) - Emergency alert, Normal Report, System report based on the reporting time as mentioned in the PDF.
Event Type: The Event Type are grouped into broader categories like - Cybersecurity Issues, Environmental factors, Operational failures, Public appeal, Security Incidents, System Operations, Others / Unknown.
Areas Affected: If the values contain multiple states, they are split and kept, if they are split into cities / counties they are considered only at a state level and ignored at county or city level. The split ups have unique ID and hence not considered repetitive. The data is standardized / Normalized with the correct State list. Also non US states were ignored but PR was considered. This is again the challenging part where I took a lot of time deciding how to proceed.
Refer to the document for more detailed steps involved in the data cleaning process!
Phase 2: Data Analysis & Visualization:
The dashboard has 4 pages to navigate for insights on Trend, Event Impact, Geographical and Alert Reporting. It also has got few navigations / buttons / filters to extract more information.
Insights & Analysis:
Business Suggestions (Actionable Insights):
Conclusion (Personal Experience):
There was a lot of learning involved from data cleaning to visualization. This gave me a real-time project experience. I survived this and hope I will survive the data world too! I have spent countless hours working on the data cleaning process - especially restoration date and time, as well as NERC and Areas, affected by deciding how to split values, what to consider for analysis, etc. There might for sure be a way to get through these challenges a bit easier, but with the skillset I have, I decided to take up this challenge and survive! With further projects, I’m sure I will find ways to optimize the process!