__STYLES__
1. CHALLENGE OBJECTIVES
Role: Senior Analytics Consultant
Data: US Power failure (Jan 2002 – June 2023), Survey form examples
Source: US Department of Energy
Task: Data Cleaning and Dashboard/Report
Upon obtaining the dataset, the initial step involved conducting a comprehensive analysis of the sheets. As an analyst, my strategy involves juxtaposing the Data dictionary with the dataset to enhance comprehension of each column. While scrutinizing the sheets, it was essential to gain insights into the data collection process, particularly as it spanned 22 years and involved survey forms. A critical aspect of understanding the data entailed examining the structure of each sub-sheet from 2002 to 2023.
Notably, it was observed that the 22 sheets exhibited variations in the number of columns and rows.
I organized sheets with similar structures to form four distinct groups.
In order to amalgamate these sheets into a single main sheet, the creation of a master dataset was imperative to harmonize and unify all the individual sheets. The consolidation process involved fundamental splitting and text-to-columns operations, along with the addition of specific columns within groups 1 and 2. The established standard for the master dataset was:
Different set of operations were needed for different year groups.
Group 1:
Group 2:
Group 3: Delete month column
Group 4: Delete month column
Next is the intriguing and challenging task of standardizing each column to a consistent and uniform data type.
IF(CONTAINS)
function to determine if the string contains "/" or "-".FIND AND REPLACE:
REMOVE:
SWAP:
DELETE:
SETTING A STANDARD: DD-MM-YYYY hh:mm REPLACING all date values with DD-MM-YYYY 12:00 AM
REPLACE:
DELETE:
The Time column after this basic cleaning returned 3 different types of format
GROUPING all these times into 5 different groups
= IF(ISNUMBER(SEARCH("-",A1)), IF(ISNUMBER(SEARCH("a.m.",A1)), 2, IF(ISNUMBER(SEARCH("p.m.",A1)), 3, 0)), IF(ISNUMBER(SEARCH("a.m.",A1)), 4, IF(ISNUMBER(SEARCH("p.m.",A1)), 5, 1)))
Group 1: DD-MM-YYYY hh:mm
Group 2: MM/DD/YY, hh:mm a.m.
Group 3: MM/DD/YY, hh:mm p.m
.Group 4: hh:mm a.m. Month DD
Group 5: hh:mm p.m. Month DD
The steps to clean these group were:
=MONTH(DATEVALUE(MONTH&"01"))
=YEAR(YEAR)
=CONCAT(TEXT(TIME,"hh:mm")," ",AM/PM)
=TEXT(TIME,"hh:mm")
=CONCAT("DD,"-", "MM,"-",YYYY," ",TIME)
This column had around 40 unique valuesConvert the column to whole number
This particular column comprised approximately 35 distinct values.
The column consisted repeated string patterns like
Employing basic TRIM, SORT, and UNIQUE functions yielded a set of unique values present in the column. The resulting unique list was structured as follows:
The most straightforward method for organizing such strings involves utilizing VLOOKUP. In this dataset, I segmented the Alert Criteria details into 11 primary alert criteria for ease of grouping.
Like alert criteria, Basic TRIM, SORT and UNIQUE functions returns a unique set of values used in the column.
This column set was divided into 6 main Events and 25 Sub-Events
The initial task involved gaining insight into the NERC region by consulting the official website (https://www.nerc.com/AboutNERC/keyplayers/Pages/default.aspx), where a color-formatted representation of the NERC region, organized by state, was available
The allotment of the regions to states needed an understanding of the how state boundaries were drawn using a state map of USA
Utilizing the aforementioned two maps, I created an Excel sheet in which each state was assigned to a specific NERC region. This allocation was based on the maximum area covered by each state within the respective NERC region, irrespective of state boundaries. For instance, although certain areas of Wyoming fall under WECC and others under MRO, the predominant coverage is under WECC, leading to the assignment of Wyoming to the WECC region.
To ensure the accuracy of my assumptions regarding the NERC regions for each state in the subsequent analysis, I generated a dummy Tableau map. The following map illustrates the appearance of the dummy:
The State-NERC Region was created on a back sheet.
Cleaning of the column “Areas affected”
For every 1 value, selecting again with respect to the counties
New columns have been introduced to the comprehensive dataset, encompassing:
The final dataset looked like this:
The initial concept for the theme aimed to align with elements associated with electricity and darkness. Among various choices such as Blue/White, Yellow/Black, and Green/Grey, I ultimately decided to adopt the BULB-DARKNESS theme.
The initial phase involved the identification of dimensions and measures, followed by the creation of various combinations using these elements. I generated approximately 35 sheets featuring diverse combinations of the dataset and chart styles, encompassing Bar, Line, Text, Heatmap, Table, Scatter Plot, Treemap, Radar, Dendrogram, Donut, and Stacked Bars charts.
Created an Excel spreadsheet to organize all charts, outlining the significance of each chart along with key findings. Assign each chart to one of the dashboards (Dashboard 1, 2, 3, or 4).
Creating the dashboard visuals by referencing the dashboard setup image from the Figma website to design the graphics.
Export the completed images again into SVG format.
Complete the remaining formatting and all the details on dashboards and then adding User friendly navigation buttons for easy understanding.