__STYLES__
This project cleans data from an electricity consumption logbook and then presents it in the form of a pivot table and bar chart which can be used by anyone to get possible insights from the data and make decisions accordingly.
The dataset has been used with permission from modeloff, which is a global financial modelling competition that requires contestants to use Excel to solve financial questions and case studies. Though it is referred to as a “financial modelling” competition, it doesn’t require participants to build a complete valuation model of a business; it mostly focuses on independent questions that require mini exercises in Excel.
The dataset consists of 8k+ rows and contain information about date, time and units of electricity consumed in kWh.
The final excel workbook, after data cleaning and visualization can be found here.
I took the following steps to arrive at Cleaned Data from its raw state.
Extract the units of electricity consumed. For this I do the following:
Extract the date values from the data records. I follow the exact same approach of flash fill to get the dates values. But as can be seen, there are some records where we get numbers instead of dates. So, I select the whole column values and click on format cells. Then I change the cells to date type using Custom Tab with the format dd-mmm-yy. The same can be achieved by using the shortcut Ctrl+Shift+3. Furthermore, I also need to get rid of additional characters in this column like st,rd,th,nd and _, so that Excel can properly read these values as dates. So, I simply select all the cells and press Ctrl H which is a shortcut for find and replace. I replace st,rd,th,nd and _ with empty value as shown below.
Get the weekdays for every date in the records. So, I make one additional column and use the function TEXT() to get the day of week from date value as shown below. Then this function is applied to all the records which gives weekday for each. For applying the function to all records, I simply double-click the fill handle that appears at the bottom right corner of the day cell in which I entered the formula.
Extract the time component from raw data records. Again, I use the same approach of Flash Fill for getting the value of time. However, this in this case, the time values do not get correctly replicated for the other records. Reason being that when I enter 3 PM in the first record, it automatically converts it in to a time format which is given by 15:00:00 as can be seen highlighted by black on the top. And hence when I try to flash fill, it starts getting wrong values because it is actually looking for values in 15:00:00 format. So, a quick way around for this is to enter the time value as text value and then flash fill, which prevents this problem. The text value is entered by typing ‘8 PM. The more data values I give to the flash fill, the more accurate will its results be. So, I enter some values and then press Ctrl E and all the records in that column get populated.
I need to get the time value in proper time format and not text. So I use the function VALUE() here, which converts a text string that represents a number into a number and store it in another column TIME v2. Thus, the function will convert text that appears in a recognized format (a number, date, or time format) into a numeric value. Now after applying this function, I convert the number to time by using shortcut Ctrl+Shift+2, which is used to convert a number into time format. This can also be achieved using format cells option.
So now I have extracted all the data points and it ready for some visual analysis.
Using all the above extracted features, I select the whole data and insert a pivot table on the new sheet.
I drag date to rows and then right click on it, select Group and then group them by Month. So as an output, all the dates collapse to their corresponding months.
Then I drop day to the columns shelf and KWH in the Value Fields Area. Then I simply format the cells to make values consistent by allowing only 2 values after decimal. Then I select all the field values except grand totals and go to conditional formatting and color them with a heatmap of white-red as shown below. Moving forward, to make the visual more appealing, I add sparklines for every month, which shows the trend of electricity consumption for every week day of the given month.
So, I go to Insert > Sparklines > Line. Then I select values of all days in first row except grand total and then click OK. So then I simply drag that cell below for filling out all the records in that column and as a result obtain a column with sparklines for every row data. Further, in order to filter the pivot table by time of day, I just do the following.
Click on any cell value inside Table > Pivot Table Analyze > Insert Slicer > Time v2
This simply gives me a dropdown filter which I can use to alter values in the pivot table and analyze electricity consumption at any time for any day of a month. It can be seen based on the sparklines that for most of the months, electricity consumption is more on weekends as compared to weekdays.
Now based on the above data, I proceed to make some general charts.
Click on any value inside pivot table > Press fn + f11 and we have a chart made out of the pivot table in front of us. The above chart is a bit cluttered and difficult to interpret, so I simply convert it in to a stacked bar chart, by right click on chart > Change Chart Type > Stacked Bar Chart. From above chart, one can see that maximum electricity consumption happens in July followed by January and June. Now this chart will be filtered out as we filter the pivot table using time of the day menu available on the pivot table sheet.
As a final step, I drop the Time Field in the Report Filter, in order to generate different reports for every hour. This can be useful if someone needs to look at report for every hour separately. The steps followed are Click on any cell of Pivot Table > PivotTable Analyze > Pivot Table > Options > Show Report Filter Pages.
As soon as I do that, I get 24 individual reports which have made by filtering out the pivot table for every hour value from 12AM to 11PM present in the data.
Similarly the above data and pivot table can be customized in any way by the user to visualize data according to questions and insights that need to be derived from it.