Data Cleaning
- I used Power Query in Power BI to import and transform the data from CSV files.
- I checked the data types and formats of each column and converted them to appropriate types, such as text, number, or date.
- I removed any unnecessary columns or rows that were not relevant to the analysis, such as blank rows or duplicate rows.
- I merged the tables based on their foreign key because the table contains any or all these column names “Release Period”, “State” and “Measure ID”, using merge query. This created a single table that contained all the relevant information for national and state surveys.
- I created a calculated column using conditional column for "survey count" to extract only numbers which I named the column “Survey Num”.
- I handled some missing values by replacing them with appropriate values, such as substituting “Not Available” with zero or blank.
- I dealt with some outliers by removing or adjusting them, such as removing surveys or "response " that have zero values.
- I used DAX measures to convert National, State box answers and Response Rate figures to percentages.
The assumptions I made while cleaning the data were:
- The average numbers I assigned to the ranges of completed surveys were reasonable and representative of the actual numbers. For example, I assumed that “300 or More” meant 350, “Fewer than 100” meant 50, and so on.
- The missing values I replaced with zero or blank were not significant or influential to the analysis. For example, I assumed that “Not Available” meant no data was collected or reported in the hospital.
- The outliers I removed or adjusted were errors or anomalies that did not reflect the true situation. For example, I assumed that zero completed surveys or "response rate" were either inactive or invalid.
Data Visualization
- I have used words that anyone can easily understand as they go through the dashboard. I have used clear and descriptive labels, titles, and tooltips for each visualization.
- I have used maps and buttons to make the dashboard fun and easy to follow. The user can also use the button to navigate between different pages of the dashboard.
Insights and Recommendations
- The average response rate for the patient satisfaction surveys is 26.6%. This indicates that there is room for engaging and motivating patients to complete the survey.
- The total number of completed surveys has decreased from 29.8 billion in 2016 to 21.6 billion in 2023, which suggests that there is a decline in patient participation and feedback over time.
- The national box answer shows that most patients are satisfied with their hospital experience, as 71.7% of them gave top answers, 20.3% gave middle box answers, and only 8% gave bottom box answers. However, there are some variations across states and measures, as shown below.
- California has the highest number of completed surveys (24.7 billion), followed by Florida (18.2 billion) and Texas (17.6 billion). However, these states also have the lowest response rates (0.13, 0.14, and 0.15 respectively), which means that they have a large population of patients but a low proportion of them completed the surveys.
Recommendations
- To increase the response rate, the hospitals should implement some strategies to encourage and incentivize patients to complete the surveys, such as providing reminders, rewards, or feedback.
- To increase the number of surveys completed, the hospitals should reach out to more patients who are eligible for the surveys, using different channels, methods, or languages.
- To improve the scores for cleanliness and quietness of the hospital environment, the hospitals should invest more resources and efforts in maintaining the physical conditions of their facilities, such as hiring more staff, using better equipment, or noise measures.
- To improve the scores for other measures of patient satisfaction, the hospital should monitor and evaluate their performance and identify their strengths and weaknesses, such as using benchmarks, audits, or feedback loops.
- To learn from the best practices of the high-scoring hospitals, the hospitals should seek opportunities for collaboration and exchange with them, such as forming networks, partnerships, or alliances.
Challenges
- Combining the seven tables in Power BI using Power Query to become one table took a lot of time. This was because the tables had different formats, levels, and sizes, and required multiple steps of merging, transforming, and filtering.
- Saving the project was hard because it failed many times. This was because the data set was large and complex, and required a lot of memory and processing power. I had to save the project and use backup to avoid losing my work.
- Visualizing the data was challenging because it involved many variables, dimensions, and measures. I had to choose the appropriate charts, colors, and interactions to display the data in a clear and meaningful way. I also had to ensure that the visualizations were consistent, accurate, and well-labeled.
- Some tables had different formats or conventions, such as using ranges instead of numbers for completed surveys. This required me to assign average numbers to these values to make them numeric and meaningful.
For more: https://github.com/Star-cj/maven_healthcare_challenge.git