Assessing the Effectiveness of the HCAHPS Survey

Tools used in this project
Assessing the Effectiveness of the HCAHPS Survey

About this project

Please click here for a pdf version of the final report

About the Challenge

This project was completed as an entry into the Maven Analytics’ Healthcare Challenge, wherein I am acting as a data analyst for the American Hospital Association (AHA). I’ve been asked to analyze the Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) survey results for the last 9 years in order to determine whether or not it has been successful in creating incentives for hospitals to improve their quality of care.

The dataset consists of 23 fields with 47,889 records in 7 tables in CSV file format

Researching My Role

It’s important to understand what my role is, who I will be making recommendations to, and what their goals are.

I found the following information on the AHA website:

The American Hospital Association (AHA) is the national organization that represents and serves all types of hospitals, health care networks, and their patients and communities. Nearly 5,000 hospitals, health care systems, networks, other providers of care and 43,000 individual members come together to form the AHA.

Through our representation and advocacy activities, AHA ensures that members' perspectives and needs are heard and addressed in national health policy development, legislative and regulatory debates, and judicial matters. Our advocacy efforts include the legislative and executive branches and include the legislative and regulatory arenas.

Founded in 1898, the AHA provides education for health care leaders and is a source of information on health care issues and trends.

Vision: The AHA vision is of a just society of healthy communities, where all individuals reach their highest potential for health.

Mission: To advance the health of all individuals and communities. The AHA leads, represents and serves hospitals, health systems and other related organizations that are accountable to communities and committed to equitable care and health improvement for all.

I also found the AHA 2023 Advocacy Agenda. Since I am making recommendations to the AHA, I wanted to ensure they were aligned with that agenda in both purpose and form.

Researching Contextual Information

Prior to taking on this project, I had never heard of the HCAHPS survey, or the VBP and QIO programs. I knew little about what the AHA does, and the only thing I knew about Medicare is that I will probably rely on that for medical coverage when I retire (I should probably do more research about this).

ChatGPT was a helpful research assistant on this project. However, I know that it often provides information that is incomplete or inaccurate, so I ensured that I was not relying on it too heavily and double-checked all the information it provided.

In addition to the sources listed in the bibliography, I found the following resources helpful in understanding the subject of my analysis:

HCAHPS: Patients' Perspectives of Care Survey


HCAHPS Fact Sheet


The HCAHPS Survey – Frequently Asked Questions




Hospital Value-Based Purchasing Program


Quality Improvement Organizations


Data Cleaning and Processing

Since this is a one-off analysis and not a dashboard, I opted to do most of the data cleaning, transformations, and calculated columns using Excel prior to creating a data model for analysis in Excel’s Power Pivot.

One of the tables is named “measures”, but that is a reserved name in Power Query. Since the table describes the measures, I renamed it to “measure_desc”.

The ’Release Period’ column of the ‘national_results’, ‘reports’, ‘responses’, and ‘state_results’ tables are formatted as mm_yyyy, which would be difficult to work with since it is not a valid date or number format. Because all the release periods are in July, I used the following formula to extract the year only, and then pasted these values into the ‘Release Period’ columns of the respective tables:


Ensured all the percentages in the ‘national_results’ table summed to 100 (they did)

Checked all tables and columns for value discrepancies

Created a new column in the ‘measure_desc’ table named ‘VBP calc’ to indicate whether or not each measure is included in the VBP calculation (TRUE/FALSE)

Created a new column in the ‘national_results’ and ‘state_results’ tables named ‘Weighted Average’ to provide a single metric from 1-3 indicating the weighted average of the bottom-box, middle-box, and top-box percentages. I used the following formula to calculate this metric and pasted the values into the new columns:


This metric was very helpful in exploring the data and understanding trends, although I opted not to use it in my report because it would require too much explanation. I wanted my visualizations to speak for themselves.

Since the US territories are included in the ‘responses’ table, which contains a primary key, I added them to the ‘states’ table, which contains a corresponding foreign key.

Note that the ‘state_results’ table does NOT include results from any of the US Territories, as the inclusion of the U.S. territories (American Samoa, Northern Mariana Islands, Guam, Puerto Rico, and the Virgin Islands) in the amended regulatory definitions of ‘States’ and ‘United States’ for purposes of the Medicaid Drug Rebate Program (MDRP) has been delayed for 2 years until April 1, 2024 [Federal Register / Vol. 86, No. 102 / Friday, May 28, 2021 / Proposed Rules


In the end, I opted not to include data from the US territories in my report, since they were not included in the VBP program during the timeframe under analysis.

All data for the state of Maryland was missing from the ‘state_results’ table for the 2016 release period (discharges between 10/1/2014 and 09/30/2015). In a real-world situation I would reach out to my contact to inquire about obtaining the missing data. For this project, I reviewed summary information regarding the missing data and found that the response rate and top-box scores for Maryland from this period were very highly consistent with results from the 2015 and 2017 release periods. For the sake of expediency, I opted to average the 2016 and 2017 results in order to obtain imputed results for the 2016 release period. I ensured all percentages summed to 100% and adjusted the middle-box number to compensate for any minor discrepancies. I also filled-in the actual average response rate of 26% into the ‘responses’ table for this same period. https://www.hcahpsonline.org/globalassets/hcahps/summary-analyses/summary-results/july-2016-public-report-october-2014---september-2015-discharges.pdf

Loaded the CSV files into Excel, making any necessary transformations, and created a data model.

Added a calculated column to the ‘reports’ table named ‘COVID PHE’ to denote whether or not the release period included dates from the COVID PHE (TRUE/FALSE).

Analyzing and Reporting the Data

Analyzing the data was an iterative process wherein my exploratory analysis informed my research, and my research informed my exploratory analysis. Keeping in my role in focus, I continued this process until I had a clear vision of what the data was telling me, what problems presented themselves, and what my recommendations would be.

I opted to present my analysis in report form because there was a lot of contextual information, whereas the data visualizations necessary to understand the problems were quite simple.

Additional project images

Discussion and feedback(0 comments)
2000 characters remaining