Coming Back for More | SQL

Coming Back for More | SQL

About this project

You've probably heard all about the newest weight-loss trend. You may have heard references to Ozempic, Wagovey, or Mounjaro, or Semaglutide from celebrities like Elon Musk. It's a hot topic on social media and among friends.

No alt text provided for this image

It's also been in the news lately for its impact on the drug supply for diabetics. Before its introduction in to the weight loss arena, this class of drugs was typically given as a secondary medication to patients with Type 2 Diabetes. Some people now get it prescribed for weight loss and some people are getting it compounded at IV drip bars. There have been success stories and there have been horror stories. All this made me wonder what the criteria for using these meds actually are, which led me to a paper on the impact of the A1c lab (tests average blood sugar over the past 3 months) on hospital readmission rates for diabetic patients.

I used Structured Query Language (SQL) to analyze the 10 years of clinical diabetes care data:

  • The average stay for these patients was 4 days in the hospital
  • 11% of these patients were readmitted in less than 30 days
  • Race does not appear to have an effect on number of labs ordered
  • Hemoglobin A1c was run on only 3% of patients with a primary diagnosis of diabetes

The Data Set

This information was collected from 130 hospitals using the Cerner Electronic Health Record (EHR) System. Data were de-identified in accordance with HIPAA and submitted to the UC Irvine Machine Learning Repository. For the years between 1998 and 2008, the data was filtered for inpatient encounters of diabetic patients who stayed in the hospital between 1 and 14 days, during which labs were performed and medications were administered. Encounters that resulted in death or discharge to hospice were excluded.

Two tables, health and demographics, were imported from here as csv files in to MySQL Workbench for analysis. Health contained features describing diabetic encounters including diagnoses, diabetic medications, labs ordered, length of stay, discharge status, and readmission.

A1c Lab

The A1c lab, reported as a percentage, estimates average blood sugar level over the prior 3 months by looking at the patient's hemoglobin. Hemoglobin is a blood protein that stores iron and carries oxygen/carbon dioxide. The A1c lab measures how much of the patient's hemoglobin**** has glucose (sugar) attached to it. Because of the hemoglobin involvement, A1c can be elevated by medical conditions such as chronic anemia or iron deficiency. This is the lab result that insurance companies who cover weight loss injections use to determine whether they will.


A normal A1c level is below 5.7%. A level of 6.5% or more, along with high blood sugar and low glucose tolerance, may indicate diabetes, an insulin condition. Insulin is a hormone that regulates blood sugar by sending it from the bloodstream in to either the liver/fat cells for storage or to muscle cells for use. It is typically released from the pancreas when blood sugar increases, after a meal for example. Type 1 (juvenile) Diabetes is when the pancreas fails to produce insulin. Type 2 Diabetes (insulin resistant) usually occurs when cells essentially fail to respond to the insulin that is produced. The ICD-9 diagnosis codes used in this data set indicate diabetes diagnoses when the code starts with the number 250.

Analysis and Results

First, I did some basic Exploratory Data Analysis using the SQL COUNT function, DISTINCT statement, and WHERE clause.

For the 101,766 hospital encounters, there were 71,518 separate patients. Re-admissions were labeled as > or < 30 days.

11% of the encounters were <30 day re-admissions

No alt text provided for this imageNo alt text provided for this image

I then used the SQL RPAD Function to visually represent the distribution of length of stay in a histogram. RPAD fills, or right-pads, a string, here just a space created by ' ', with the * symbol to the calculated frequency of patients (total count/100), creating a new column in the Result Grid with a histogram made of asterisks.

No alt text provided for this imageNo alt text provided for this imageThe average hospital stay was 4 days per encounter.

To determine if race was a factor in the number of labs, I used a JOIN to combine the demographics and health tables using the common matching patient numbers in both tables. Then I calculated the average number of labs grouped by race.

No alt text provided for this imageNo alt text provided for this imageNumber of labs ordered did not vary significantly by race.

To determine if the primary diagnosis was related to whether an A1c lab was ordered, I created and saved a query where I used a CASE Statement to split the A1C results in to either no for 'None' or yes for any value that was attained. I used another CASE in that query to split the primary diagnosis (diag_1) in to diabetic diagnosis codes (those starting with 250) or non-diabetic diagnosis codes (those not starting with 250).

No alt text provided for this imageNo alt text provided for this image

I was then able to use that saved query to calculate the proportion of patients who had A1c labs run during these encounters with and without primary diagnosis codes of diabetes.

Only 2,756 patients with a primary diagnosis of diabetes had a hemoglobin A1c test in these encounter data. That is only a rate of 3%.

No alt text provided for this imageNo alt text provided for this image

Of the 46,902 patients (16%) who were readmitted, the majority (84%) did not have a recorded A1c result in this data set. However, the same ratio is seen the patients who were not readmitted and either did (17%) or did not (83%) get tested for hemoglobin A1c levels.

No alt text provided for this imageNo alt text provided for this image

Finally, I created a verbal summary of the 50 patients with the shortest stays, along with their readmit status and A1c results by concatenating strings and values.

No alt text provided for this imageNo alt text provided for this image


  • Most patients' hospital stays were fewer than 5 days, but about half were readmitted at some point, including 11% who were readmitted within 30 days.
  • It is very surprising that only 3% of patients with a primary diagnosis of diabetes had A1c lab results in this data set, given that it is regularly used in the outpatient setting to inform management of hyperglycemia. It is possible, as also noted by Strack et al, that the Cerner EHR stores routine A1c results in a separate table or database.
  • Further studies could investigate how often admitting physicians access the A1c results and whether they take more effort (extra clicks) to access within the EHR system.
Discussion and feedback(0 comments)
2000 characters remaining