Tools used in this project

Tableau Dashboard

About this project


Electricity outages are a growing concern as we enter an age of unprecedented energy demand and climate disasters. I will be playing the role of a Senior Analytics Consultant hired by the U.S. Department of Energy (DOE) and have been tasked with creating a dashboard or report to help understand the patterns and trends around the outages, quantify their impacts on the communities, and identify the possible weak points in the grid.

About the Dataset:

Information on electric disturbance events is collected using Form DOE-417 and published online in an annual summary. The dataset contains 4 files for download: An Excel spreadsheet containing the annual summaries, and 3 PDF documents for reference (the survey form, instructions, and documentation for online form submissions). The data is an event-level power outage data spanning from 2002 until 2023. The data has severe issues with quality and hence a rigorous cleanup was required.

Tools Used:

Power Query (Data Cleaning)

Tableau (Data Visualization)

Phases involved:

The overall process is divided into 2 phases - Data cleaning, and Data Analysis & Visualization.

Phase 1: Data Cleaning and Preparation:

The Cleaned Dataset:

The data with the null date events, and empty rows were removed. I assumed and mapped the data for the null NERC Regions by Areas Affected column and hence they are taken into consideration. The Counties and Cities are considered at a State Level. All the fields in the sheet for 2002-2022 are prepped to be consistent with the 2023 sheet. The NERC regions, Alert Criteria (the data for this was not provided for the early years), Events are grouped into broader categories and the Areas affected column is cleaned and updated with the respective State Names. The date fields especially in the restoration datetime field was extremely messy and that was thoroughly cleaned, and a standard time format is achieved.

Assumptions made (on a higher level, more granular details, in the later section):

NERC Region: The NERC Regions are segregated, and the typo or half written Regions are identified (based on the Areas affected and checking them online) and they are replaced accordingly. If a value has got 2 or more regions, they are split accordingly. Note: Few of the NERC Region mentioned were either dissolved or merged but I have kept them not to lose the essence of information, if I find anything interesting with knowing about the previous existent region. I do not want to assume them with any other or completely ignore them as Unknown / Other, as the other details along with them may be affected. Hence, I chose to keep them.

Alert Criteria: There were many incomplete Criteria listed, they are segregated, and I got a clean list of alert criteria. Then I ensured they map with the details in the PDF provided. Also, I have categorized them into broad categories like (Alert Reporting System) - Emergency alert, Normal Report, System report based on the reporting time as mentioned in the PDF.

Event Type: The Event Type are grouped into broader categories like - Cybersecurity Issues, Environmental factors, Operational failures, Public appeal, Security Incidents, System Operations, Others / Unknown.

Areas Affected: If the values contain multiple states, they are split and kept, if they are split into cities / counties they are considered only at a state level and ignored at county or city level. The split ups have unique ID and hence not considered repetitive. The data is standardized / Normalized with the correct State list. Also non US states were ignored but PR was considered. This is again the challenging part where I took a lot of time deciding how to proceed.

Refer to the document for more detailed steps involved in the data cleaning process!

Phase 2: Data Analysis & Visualization:

The dashboard has 4 pages to navigate for insights on Trend, Event Impact, Geographical and Alert Reporting. It also has got few navigations / buttons / filters to extract more information.

Insights & Analysis:

  • The total number of outages spanning from 2002 until 2023 are 3822.
  • The years with the most power outages were 2022, 2021, and 2020.
  • The total demand loss is over 1.1 million MW, a figure indicative of the scale of outages.
  • Over 312 million customers were affected by power outages, which could potentially include widespread and frequent service interruptions or more localized causes.
  • The average restoration time for outages is approximately 35.41 hours, which may indicate a prolonged impact on customers and businesses.
  • There is a significant fluctuation in outages/events over the years, with a noticeable peak in the years 2011 and 2022.
  • The data shows seasonality within a year (Monthly and quarterly). There are peaks around July and August, suggesting a possible correlation with events like weather incidents or increased energy usage during hotter months.
  • The Majority of outages occur on Weekdays with 78.8%. This could be due to increased commercial and industrial activity during the Weekdays.
  • The Majority of outage occurs in the Morning followed by Noon / Evening. This may correlate with peak energy usage times and suggests that infrastructure might be strained during these periods.
  • The average number of customers affected per year shows fluctuations with peaks in intervals. It is noticeable that it is in a decreasing trend which is positive.
  • The demand loss trend over the year shows the year with high demand loss is 2003.
  • The environmental factors are the most common, followed by operational failures and security incidents(Including “vandalism”) in contributing to the power outage.
  • It is noticeable that among the top 2 incidents namely “environmental factors” and “Security incidents” which include theft/vandalism, etc, the trend shows there is an increasing trend in security incidents from the past few years.
  • Environmental factors are by far the leading cause of outages, affecting over 88% of the customers, significantly more than other causes.
  • Similar to the customer's impact, environmental factors lead to the highest demand loss, with operational failures and "other/unknown" causes also contributing to the losses.
  • Cybersecurity issues lead to the longest average restoration times followed by environmental factors.
  • “California”, “Texas”, and “Washington” etc to name a few, are among the States that suffered the most power outages.
  • The “WECC” region has the highest count of outages, followed by RFC and SERC.
  • California, Texas, and Nevada are among the top states in terms of demand loss due to outages.
  • Texas, California, and Michigan are the top states with the highest number of customers affected by outages.
  • The Western Electricity Coordinating Council (WECC) has the highest number of emergency alerts, followed by SERC and TRE.
  • California leads with the highest number of emergency alerts, followed by Texas and Washington. The distribution suggests that some states are more prone to emergency events than others.
  • Environmental factors dominate emergency alert reporting, surpassing system operations, and Security Incidents.
  • The system reports and normal reports account for the majority of alerts, with less number of emergency alerts comparatively. Maybe the current system can handle issues sometimes without escalating to the emergency status.
  • Emergency reports lead to the longest average restoration times, which is consistent with the severity suggested by the emergency nature of the reports. It doesn’t cause just outage but also result in longer durations, amplifying their impact.

Business Suggestions (Actionable Insights):

  1. Prioritize infrastructure upgrades in high-outage areas, particularly focusing on weatherproofing, grid modernization (the increasing rise in trend in recent years may be due to the aging infrastructure), and incorporating smart grid technologies. With the data presented, seems like the grid is vulnerable to extreme weather events, which are becoming more frequent and severe with climate change.
  2. Develop state-specific strategies for areas like California and Texas, where the impact of outages is high in terms of customer numbers and demand loss.
  3. Integrate climate change impact assessments into planning phases to mitigate the trend of environmental-related outages.
  4. Invest in renewable energy sources to reduce the impact of outages.
  5. Strengthen the Cybersecurity infrastructure as it includes high restoration power and hence managing it beforehand would be great.
  6. Streamline the necessary emergency response processes so that it can reduce average restoration hours for emergency alerts.
  7. Invest in prediction analytics to anticipate potential outages and deploy necessary measures.

Conclusion (Personal Experience):

There was a lot of learning involved from data cleaning to visualization. This gave me a real-time project experience. I survived this and hope I will survive the data world too! I have spent countless hours working on the data cleaning process - especially restoration date and time, as well as NERC and Areas, affected by deciding how to split values, what to consider for analysis, etc. There might for sure be a way to get through these challenges a bit easier, but with the skillset I have, I decided to take up this challenge and survive! With further projects, I’m sure I will find ways to optimize the process!

Discussion and feedback(0 comments)
2000 characters remaining