__STYLES__
Sepsis is a life-threatening medical condition that is caused by the spread of infection in one’s body, resulting in multiple organ failure or even death in many cases. Since it is a serious issue which can cause death, prediction of survival in such cases is among the top priorities of medical community today. Though we have a lot of sophisticated medical laboratories that can provide useful information about the patient, yet these things take their time, and might not be available immediately, not allowing medical practitioners to detect an urgent life threat and treat accordingly.
Machine learning can be applied in this setting to get faster results. It can help build a model which can take in variables that can be easily retrieved, and if dataset available is large enough and contains balanced information on alive and deceased patients, can give accurate predictions of survival based on a handful of related inputs only.
This project is inspired from a research that aims to show that prediction of survival from Sepsis can be done with minimal predictor variables that are very easily available, allowing medical practitioners to quickly identify patient outcomes. Original researchers have used a total of 5 different machine learning algorithms and showed the accuracy of models trained by using only 3 predictor variables. I have used 4 machine learning algorithms I think can be useful in this situation to validate the claim made by the researchers in their paper. Please refer to the research paper attached for any reference during this project. The official research paper can be found here.
Purpose of this project is to predict the survival of patients with sepsis from age, sex and septic episode number as predictor variables. Different classifier methods will used and the one with better performance metrics will be recommended for Sepsis prediction on test dataset after being trained on training data available.
have used two datasets for this part of the project:
's41598-020-73558-3_sepsis_survival_study_cohort.csv'
's41598-020-73558-3_sepsis_survival_validation_cohort.csv'
Dataset 1 is made of 19,051 admissions of hospitalized subjects between 2011 and 2012 in Norway who were diagnosed with infections, systemic inflammatory response syndrome (SIRS), sepsis by causative microbes, or septic shock. The data comes from the Norwegian Patient Registry and the Statistics Norway agency (please refer research paper for further details)
Dataset 2 consists of South Korean critically ill patients whose medical records were collected between between January 2007 and December 2015 and publicly released by Lee and colleagues. From their original dataset, researchers selected the data of 137 patients who had already 1 or 2 septic episodes and tested their model on this dataset to further corroborate their findings.
So in this project I will be training the machine learning model on Dataset 1 (that is the study cohort) and then test the resulting model on the instances of Dataset 2.
The datasets can be found in the GitHub repository for this code.
The complete data exploration and model building is done in Python. The Python script, along with the research paper and datasets can be found here
Each part of the project has been explained in detail in the Python Jupyter Notebook.
So, I am skipping any further explanation of the project here. In case there are any questions or areas for improvement, please feel free to contact me and I will be more than happy to do the needful.