__STYLES__
This project was completed as part of a consulting engagement for a mental health startup. For the sake of privacy and maintaining anonymity, I am using the name "Whole-Health" instead of the actual company's name.
Whole Health leverages AI, Data Science and Clinical Data to provide affordable, high-quality mental health care to patients, all from the comfort and privacy of their homes.
Senior executives of Whole Health engaged my services to help them analyzing patients data in their database, to investigate the impact of patient’s mental health PHQ assessment on drug prescription and frequency.
The provided data are synthetic and were simulated from patients in the Whole-Health network.
Provide a summary of the patients, drugs and scores datasets separately. There are no specific guidelines for what to include but your summary should provide a thorough overview of the observations being analyzed.
Transform the scores dataset to display a comparison view of a patient’s PHQ score on a given date, their score from their previous assessment, and the date of their previous assessment. Calculate the delta between the current and prior PHQ scores and the amount of time elapsed between the score dates. The final dataset should contain seven columns: patient_id, phq_score, score_date, prior_phq_score, prior_score_date, phq_score_delta, score_date_delta.
Facts/Stats -- report the number of patients for each question (a single number is sufficient):
The following Notebook contain all the steps done on Snowflake including:
Creating the Snowflake database
Setting up the integration with AWS S3 where the data was stored
Importing the data from the CSV files in the AWS S3 bucket
Creating the database tables and storing the data
Querying the data for Exploratory Data Analysis (EDA), and to answer the required business questions
The following comprehensive report was provided to the company's executives.