__STYLES__

Mawen Power Outage Challenge

Tools used in this project
Mawen Power Outage Challenge

Electric Grid Outages

About this project

In this project, we had event-level power outage data going back to 2002, but there were also significant issues with data quality and integrity. I combined and cleaned raw data and created a report to help the client understand patterns and trends related to outages.

Since the events were recorded over 10 years and the table formats changed several times, I grouped the tables with the same formats and first brought them to a common format. The data from 2003 to 2005 required the most effort. I completed the entire cleaning process in Power Query.

undefined

  1. The time was indicated in the a.m. p.m. format, to change this I used conditional columns and added 12 hours to the time marked p.m. All values like "ongoing" i replased by null.

  2. I cleaned the data on NERC Region using a conditional column, replacing obvious typos.

  3. I replaced the data on power losses and customers affected manually. Where data ranges were indicated (for example, 700-1000), I replaced them with the average between the minimum and maximum. I replaced all approximate and inaccurate values with null. Thus I didn`t recognize 15 Demand Loss values and 14 Customers Affected values. Total number of missing values out of 3903 events is 1496 and 770 respectively. I didn't touch the 0 values because there is a big difference between null and 0.

  4. I replaced Event Type values using a conditional column. And combined all weather-related causes into the category of natural disaster. "Other" values, as well as all that I could not recognize, I replaced with "Unknown". The total of these values is 210.

  5. The most difficult task was to streamline the states :

  • I added a list of all states with a separate query and used it by adding the name of all states to each row in my table,
  • using a conditional column, I found all matches (1/0) and then filtered only matches,
  • then I grouped the data by event and merged with duplicate of this table prepared in advance,
  • I divided the data on losses equally by the number of states indicated in one event,
  • I removed events outside of US from the dataset,
  • I didn`t recognize 34 Area Affected values, in the report they are replaced by "Unknown".

undefined

This is a great example of real data that a data analysis engineer has to deal with. I am very glad that there was no additional requirement to create functions to automate this entire process for further application to new unknown data.

Discussion and feedback(0 comments)
2000 characters remaining
Cookie SettingsWe use cookies to enhance your experience, analyze site traffic and deliver personalized content. Read our Privacy Policy.