__STYLES__

Maven Power Outages Challenge

Tools used in this project
Maven Power Outages Challenge

Power BI Dashboard

About this project

(Make the zoom on 81% to view the dashboard in full)

(Get ready cause this is going to be a long read ;) )

Data Dictionary:

Date & Time Event Began: Date & Time Event Began

Date & Time of Restoration: The month day year and time (in 24-hour format) when the event no longer met one of the 24 criteria for an emergency alert.

Number Of Customers affected: The total number of customers affected during the entire incident or disturbance which could be more than the peak number in the case of rolling blackouts. If this number cannot be estimated when the form is initially submitted check the unknown box.

Demand Loss (megawatts): The amount of the peak demand involved over the entire incident.

Event Type or Type of Disturbance: Cause Of Incident

Alert Criteria: Emergency criteria met that caused the form to be filled

NERC Region: North American Electric Reliability Corporation (NERC) region responsible for the restoration

Regional entities

  • MRO
  • NPCC
  • RFC
  • SERC
  • TRE
  • WECC
  • HECO

Data Issues:

Inconsistent Header Structure:

The dataset has an irregular header structure. The first two rows appear to be header information, with the second row containing the actual column names (e.g., 'Date', 'NERC Region', 'Time', 'Area', 'Type of Disturbance', 'Loss (megawatts)', 'Number of Customers Affected', 'Restoration Time').

Missing Values:

There are numerous NaN (Not a Number) values, indicating missing data.

Date and Time Columns:

The 'Date' and 'Time' columns are separate.

Data Spread Across Rows:

Some data entries, like the one for January 2002, are spread across multiple rows. This suggests that the dataset may have merged cells or inconsistent row entries.

Inconsistent Formatting:

Some fields might have inconsistent formatting, especially text fields like 'Type of Disturbance' or 'Area'.

Inconsistent Cell Records:

There are many fields that have inconsistent cell records, especially columns like “Area Affected” , “Event Type”, “Alert Criteria”, and Time Columns.

Data Assumptions:

  • I understand that adding missing values to the dataset involves estimating on my part. Although not optimal, there are instances when it is imperative. Depending on what makes the most sense for each individual column, I'll either use a common value or an average or completely remove it.

  • Split Restoration Time column into two columns: Restoration date and Time columns

  • Replaced “Noon” with 12:00:00 PM

  • Evening values is replaced with 7:30 PM, where evening is the period between 6-9 PM

  • Pacific Gas and Electric Company, commonly known as PG&E, is an investor-owned utility that provides natural gas and electricity to residential and business customers in northern and central California, United States. The company operates as a subsidiary of PG&E Corporation and is headquartered in Oakland, California. It services approximately 5.2 million households in California. Thus, I have changed the values with “1 PG&E” in the number of customers affected column with 5.2 million

  • Used 7:59:59 AM values instead of Ongoing values in time, handled it using DAX functions when performing calculations

  • Used Today function for ongoing dates

  • Technical/Operational: "Islanding," "System Operations," "Load Shedding," and "Voltage Reductions.

  • Infrastructure Issues: "Equipment Failure" and "Transmission/Distribution Interruption"

  • Power Generation Issues: "Fuel Supply Emergency" and "Generation Inadequacy."

  • Emergency Responses: "Declared Emergency" and "Public Appeal."

  • Security Concerns: "Cyber Event" and "Suspicious Activity"

  • Environmental Factors: "Weather", “Fire”, “Hurricane”, "Natural Disaster"…etc

  • CAISO (California ISO): California

  • ERCOT (Electric Reliability Council of Texas): Texas

  • MISO (Midcontinent ISO): Arkansas, Illinois, Indiana, Iowa, Kentucky, Louisiana, Michigan, Minnesota, Mississippi, Missouri, Montana (partial), North Dakota, South Dakota, Texas (partial, mainly the eastern portion), Wisconsin

  • NYISO (New York ISO): New York

  • ISO-NE (ISO New England): Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, Vermont

  • PJM Interconnection: Delaware, Illinois (partial), Indiana (partial), Kentucky (partial), Maryland, Michigan (partial), New Jersey, North Carolina (partial), Ohio, Pennsylvania, Tennessee (partial), Virginia, West Virginia, and the District of Columbia

  • SPP (Southwest Power Pool): Arkansas, Iowa, Kansas, Louisiana (partial), Minnesota (partial), Missouri, Montana (partial), Nebraska, New Mexico (partial), North Dakota (partial), Oklahoma, South Dakota, Texas (partial), Wyoming (partial

  • Created a data model to handle events appearing in multiple locations and regions, outsourced some data for US states longitude and latitude because they weren’t in data

  • Title Case Conversion: Converts the text in the 'Area Affected' column to title case (each word's first letter is capitalized).

  • Regex-based Cleaning: Removes any characters from the 'Area Affected' column that are not alphanumeric (letters and numbers), periods, commas, hyphens, or spaces.

  • Whitespace Trimming: Trims leading and trailing whitespace from the 'Area Affected' column.

  • State Name Extraction: Defines a list of U.S. state names and creates a function extract_states that identifies these state names in the 'Area Affected' text. If a state name is found, it's included in a new comma-separated string; if no states are found, the original area description is returned.

  • Apply Function: Applies the extract_states function to each entry in the 'Area Affected' column, effectively replacing the original area descriptions with a string of identified state names or leaving them unchanged if no state names are found.

  • Prints the DataFrame: Finally, the updated DataFrame Area is printed

Dashboard Details:

  • KPIs: total outages, total customers affected, total demand loss, and average outage duration.

  • Map: Plot 'Area Affected' to show geographic distribution of outages.

  • Bar Charts: Break down outages by 'NERC Region' to identify which regions are most affected. Number of outages by cause over time to spot emerging trends.

  • Line Chart: Trend of outages over time using 'Date Event Began' to show frequency. Trend of average restoration time using 'Date of Restoration' to monitor performance improvements.

  • Area Charts: Trend of demand loss & total customers affected over time to gauge the severity of outages.

  • Pie or Donut Chart: Proportion of outage types ('Event Type') to identify the most common causes.

  • Scatterplot: Correlation between outage duration and number of customers affected to find outliers where restoration was significantly delayed. (Not Shown in the dashboard because there was no Correlation found between the columns)

Slicers:

Included slicers for:

  • Time Period: Users can filter by year,month,and day.
  • Region: Users can select one or multiple NERC regions.
  • Event Type: Filter by types of outages.
  • State: Filter by Area Affected in the US

Insights:

  • 381 million customers have been affected by power outages, indicating a significant impact on a large population, possibly over an extended period or due to a major event.

  • There has been a total demand loss of 1.38 million megawatts, which highlights the scale of the power disruption in terms of energy supply.

  • There have been 3,949 outages, which could suggest widespread issues across the grid or a series of smaller, localized problems.

  • The average outage duration is 81.40 hours, which is quite long, suggesting that when outages occur, they are not resolved quickly, leading to prolonged periods without electricity for affected customers.

  • WECC is experiencing the most outages, followed by RFC and SERC . This geographical data can be valuable for understanding which regions are most affected and may need more resources for prevention and recovery.

  • The majority of outages are due to environmental factors, followed by vandalism and technical/operational reasons. This information is crucial for utility companies to address the root causes of outages and develop mitigation strategies.

  • California has the highest number of affected customers at 68 million, followed by Texas with 42 million, and Michigan and Florida both at 25 million. This indicates the states where power outages have had the most significant impact on customers.

  • North Carolina and South Carolina have the highest demand loss, each at 0.22 million megawatts. California follows closely with a demand loss of 0.18 million megawatts, indicating significant power supply disruptions.

  • The fluctuations over the years with peaks that may correspond to specific events or conditions that caused higher power demand losses in those years.

  • The number of affected customers also fluctuates year by year, with notable peaks that could suggest major outages or disaster events that disrupted power supply to customers.

  • The monthly distribution of outages with the highest peaks is in January, June, and December. This pattern may suggest seasonal effects on outage frequency, such as winter storms or summer heatwaves.

  • The annual trend of outages shows a generally increasing pattern with some years having significantly higher numbers, such as a peak at 310 outages. The trend appears to be rising over time, which could be a cause for concern and may indicate deteriorating infrastructure, more severe weather events, or other systemic issues.

  • The largest portion of demand loss, 51.39%, is attributed to environmental factors, which could include natural disasters like hurricanes, floods, or wildfires. The second largest cause is technical/operational issues, accounting for 18.69% of demand loss. Infrastructure issues and security concerns also contribute significantly.

  • A majority of customers are affected by environmental factors (77.47%), which again highlights the impact of natural disasters on power infrastructure. Vandalism, surprisingly, affects a significant portion of customers (13%), which could indicate targeted disruptions to the power grid.

  • Environmental factors lead to the highest number of outages at 41.76%, followed by technical/operational issues (19.15%) and vandalism (18.33%). This indicates that while environmental factors are the most common cause of outages, human-related issues like vandalism and technical failures are also significant contributors.

  • There is a trend of increasing outages over time across all event types, with noticeable spikes in some years. This could be due to a variety of factors, including aging infrastructure, increased frequency of severe weather events, or simply better reporting of outages which I think is the most compatible.

Summary:

  • Environmental factors are the leading cause of both demand loss and customer outages, suggesting that the grid is highly vulnerable to natural disasters. This could indicate a need for better weatherproofing of infrastructure and more robust emergency response planning.

  • The increasing number of outages over time may point to aging infrastructure that is failing to withstand current demands and environmental stresses.

  • A surprisingly high number of customers are affected by vandalism, which may reveal security weaknesses. The data also underscores the importance of addressing human-caused disruptions (vandalism) and technical failures through improved security measures and maintenance protocols.

  • These are significant contributors to both demand loss and the number of outages, indicating potential areas for improvement in grid maintenance and operational protocols.

Sources used:

Wikipedia:

Midwest Reliability Organization

Northeast Power Coordinating Council

ReliabilityFirst

SERC Reliability Corporation

Texas Reliability Entity

Western Electricity Coordinating Council

EIA & NERC websites:

Maps - U.S. Energy Information Administration (EIA)

ERO Enterprise | Regional Entities (nerc.com)

NERC Regions | NERC Regions | U.S. Energy Atlas (eia.gov)

Discussion and feedback(0 comments)
2000 characters remaining