__STYLES__
OBJECTIVE
I'll be playing the role of a Senior Analytics Consultant hired by the U.S. Department of Energy (DOE).
As Electricity outages are a growing concern entering an age of unprecedented energy demand and climate disasters, we would like to have a dashboard to understand patterns and trends around outages, quantify their impact on our communities, and identify possible weak points in the grid. But as our data are messy, you first need to consolidate and clean up the raw data.
We have event-level power outage data going back from 2023 to 2002.
Finally, please I’ll explicitly call out any caveats or assumptions I make in regard to data quality issues or missing values.
DATA SET
The provided data contains:
DATA CLEANING
This time the data needed a lot of cleaning. And I really mean a lot 😉
Number of columns/ Order of rows
All the sheets adapted to show the same number of columns in the same order as the sheet from 2023. Rows that summarized more incidents were split in singe rows. Empty rows were deleted. Titles were only kept in the first row of each sheet. Rows with Areas outside of US were also deleted. Like 2005 the Area Hamburg. Entries with no given region were also deleted.
Restoration Dates/ Dates
Dates and Time were given in one column in a not proper way. So, I created two columns out of the given columns showing the date and the time in a proper way. Restoration dates with the year 2001 are changed to 2007 or 2011, it’s assumed that they are not written in a readable way. But as the data start at 2002 2001 makes no sense.
To do so I used Left, Right, Mid and Search in combination with DATE().
For time data I extracted the hours and minutes and calculated the time with them on the basis that 24h equals 1,0. In some cases, time was given with noon or midnight. I replaced them with the exact time 12:00 PM and 12:00 AM.
For time calculation I had 4 cases by using text like 14:50 pm
If (AND(Left(text;2) = “12”; Right(text;2)=”pm”) ) Value = 0,5
If (AND(Left(text;2) = “12”; Right(text;2)=”am”) ) Value = 0
If (Right(text;2)=”pm”) ) Value = 1 / ( 24 * 60) * ((hour +12) * 60 + minutes)
If (Right(text;2)=”am”) ) Value = 1 / ( 24 * 60) * ((hour +0) * 60 + minutes)
Sometimes 12AM was written on the same day. So, the restoration would have been before the event. This was corrected by changing the day on the next day or checking if there was meant PM instead of AM.
Also PM and AM was sometimes mixed up. It was changed in the way that the restoration was after the event.
Mega Watt Loss
Type of Disturbance – Event Categories
The types are reduced to the following categories by using mostly ISNUMBER(FIND()) because by using find I do not need the exact term.
Fun fact: helicopter landing was also a cause for an outage.
NERC Regions
NERC is the abbreviation for North American Electric Reliability Corporation, a nonprofit organization, which is responsible for coordinating the electrical power grids. The supra-regional power grids in US are characterized by 7 larger, mutually, asynchronous interconnected networks, the NERC Regions. In addition to the 7 there is PR for Puerto Rico and HICC for Hawaii.
NERC Regions were checked. Because the list also showed wrong/old ones. For example, WSCC. This should mean WECC. The areas were checked with this.
https://atlas.eia.gov/datasets/eia::nerc-regions/explore
and this pdf
https://www.nerc.com/pa/RAPA/ra/Reliability%20Assessments%20DL/NERC_SRA_2023.pdf
As I was not sure if SPP is still existing or not, I kept it. Maybe the pdf states that it was also included in MRO. But as the exact area slit between MRO and SERC was not given, I decided to keep it.
The definitions are:
And
PR for Puerto Rico and
HICC for Hawaii
Missing Data; N/A, Blanks
If values were missing and I could extract them from the text, I took them from there. For megawatt loss and affected customers:
Tables Used
As there were some rows with several NERC Regions affected and also several States I created two separate tables for NERC Regions and for States.
And I also created a date table with dates from 2002 to 2023 to have a calendar table.
Creating a table with affected States
To create a table with the affected states I copied out the column with the Event_Id and the column with the areas. Afterwards I created for each State a column next to them. So I had 52 Columns (+Canada + Puerto Rico).
Then I used IF(ISNUMBER(FIND())) to check if the state name was given in the Area description. If yes, Excel was creating a 1 in the column, if not excel created a 0. For Washington D.C. I did manual corrections to set it on District of Columbia.
Then I axtended the code with CONCAT() and IF(). If 1 was the result of the step befre, the Event ID and the Name of the state were written separated witha comma in the column.
Then I used UNIQUE() and VSTACK() to create a list out of the matrix. And with TEXTSPLIT() I splitted up the concatenated Text again in Event number and State.
By doing so, I was able to filter out a list where I have single rows with Event_ID and a single state name. The intention of that was to be able to find out later which states were affected most often.
BUILDING THE DASHBOARD
Following the objective, I wanted to create a dashboard where you can
This should give a good overview of what’s going on.
Customer affected and Demand Loss can’t be shown on State Level, because the input Data were not in this detail level. But I used a Mouseover with the Map to show the State, the number of the events in the State and a Table with the different Event Categories and the # of events.
For the sum of affected customers and the average demand loss I used bookmarks and buttons. I really appreciate this way because it gives me more flexibility and it saves space.
Design & Visualization
The design is a typical dashboard design, that can be used to get a feeling for what’s going on.
The header gives a quick overview over the total happened Events, the affected customers, and the restoration Status.
The rest is dominated by bar and column charts, because in my opinion they serve best for comparing categories and this is what the dashboard mainly does.
I went with yellow, because this represents for me energy, combined it with a dark purple, to create a contrast and combined it with a light shade of blue and orange to differentiate between customer and Demand Loss.
Definition of technical Terms
The Amount of Demand Involved (Peak Megawatts), is the amount of the peak demand involved over the entire incident. I gave the average of this dimension for each category to give a feeling how severe the event was.
Used KPIs
Insights
SUMMARY, PERSONAL LEARNINGS & CHALLENGES
In general, it was a nice project, that covered many topics of real world data and I really enjoyed working on it.
Even if Data cleaning challenged me a lot this time. But somehow, I enjoy it to figure out tricky ways how to automate most of it. I also made some time consuming mistakes during the cleaning process that I had to correct. But for the future I know, where the potential for mistake lies by cleaning dates and times. It will certainly be faster the next time.
I also found out, that I must be very careful and rechecking data several times, for not overlooking illogical things. Like the duration of restoration that is minus. Because my time also included the day, but I didn’t see it, because the cell was formatted as time only. This can give you really grey hair.
Also this time I experienced that the dashboard building process is much more efficient if I sketch out a concept before starting to touch Power BI.
And this time, the project really challenged me a lot, so it helped to talk to others who were also working on it. The Power of the community really gets visible, when things get complicated. Especially for keeping the motivation and laughing together about the crazy cleaning process. 😉 Everybody who went through it this time can be really proud.
As always, I am open for any feedback or comments :)
Best, Jasmin