__STYLES__

Covid-19: A Tale of Two Outcomes - Mortality Versus Survival |Excel, Tableau|

Tools used in this project
Covid-19: A Tale of Two Outcomes - Mortality Versus Survival  |Excel, Tableau|

Geographical Covid Tracker Tableau Dashboard

About this project

Introduction:

With millions of confirmed cases and deaths around the world, the COVID-19 pandemic has had an unprecedented impact. With the challenges involved in trying to control the virus's spread, the utilization of data analysis has played a vital role in helping governments and healthcare agencies control the spread of the virus. The goal of this project is to gain further insight as to the worldwide impact COVID-19 has had. The relationship between deaths and case fatality rates (CFR) is explored. The project also reveals that the virus is still active in several parts of the world. The insights gathered from this project will help to shed light on the virus's toll on human life and provide insight with its continual impact.

Methodology:

The following steps were taken while analyzing the worldwide impact of COVID-19:

  1. Data collection: Data was retrieved from Kaggle, a source where people can retrieve data along with partake in courses related to data analytics, and partake in data science competition. Data was also collected from World Bank to determine if the impact of the virus based on the country's income level category.
  2. Data cleaning/preparation: This process included checking for and removing any duplicate rows, correcting any inconsistent data types, and deleting rows that had numerous blank and or zero values. Steps in the data cleaning/preparation process included: a). Removing the rows that contained data for Diamond Princess, and MS Zaandam as both of these were ships not countries. b). For rows that had population listed as zero or blank, the population for those countries were looked up and manually entered into the spreadsheet. c). Numeric columns were switched from the general datatype to a number data type. d). Rows that had three or more blank or zero values were deleted from the dataset. e). Excel formulas were used to create additional columns which consisted of the CFR%, recovery rate percentage, percent active (active cases divided by total cases), the percentage of positive tests out of all tests, and the death rate percentage based out of the country's total population. f). With values used from the derived columns created through calculations, I used median values to replace some of the blanks and zero values on some of the rows. A check column was created to check that the CFR%, recovery rate percent, and percent active totaled 100%. On rows where missing values were replaced with median values, the goal was to have rows be no more than 3% plus/minus 100%. The other option considered was deleting these rows that had two or more blanks/zeros. Since the check column had values that were at 100% or within 3%, it was decided not to drop these rows. undefinedMedian values from the derived columns were then used to estimate the number of recoveries, the number of current active cases, and number of tests. undefined
  3. Data Analysis: As mentioned in the previous step, Excel formulas were used to derive columns giving insights on CFR percentage, recovery rate percentage, percentage of active cases against total cases, and overall death rate based on population. Through these calculations, it was shown that a high death count does not indicate a high CFR%. Analysis was further done using Tableau. Comparisons between the country's income level were also done looking at the above variables.
  4. Data Visualization: The two main chart types used for the visualization are bar graphs and maps. A dark background was chosen since it seems to better highlight the red coloring of most of the bar charts and map charts. All but one of the charts utilized shades of red. The reasoning for the color choice is red can be sometimes be associated with danger, however it is important to consider the context of the situation as sometimes red can be associated emotion such as love or be related to the topic of heart health. The other colors chosen were shades of blue to provide a reasonable contrast between colors for those who may be color blind or have difficulty distinguishing similar shades of color.
  5. Communicating the results: Insights gathered from the analysis of the data and dashboard can communicate the worldwide impact of COVID-19. Such results can be communicated via reports, audio, or video presentations keeping in mind who the audience will be and to use appropriate terms that can be understood by the audience. These results can be shared with government and healthcare agencies so that they can make more informed decisions on bettering strategies to control the spread or to allocate resources where active cases me be high.
  6. Reevaluate: With continued research on how the virus evolves and coming up with more affective vaccines and medications to control the spread of the virus or reduce the severity, it is likely that new data will continually come in. As improvements in treatment occur, this will have an impact on the current metrics that are used in this project. With the evolution of new variants and new treatments and updated vaccines to better control the virus, new data will inevitably be available. The variables examined in this project will undoubtably be different from what they were or are not, to what they may be in the future. To better present the current impact on the world, there would need to be an update to the dashboard. This is an example of how the data analysis process is iterative and that there often times may not necessarily be a project that can be considered finished. It is also important to consider and accept any other interpretations and or criticism as to help identify areas for improvement.

Findings:

This project examined the worldwide impact of COVID-19. The project looked at the impacts on a global scale and looked at the pandemic's impact based the country's income level category. The biggest key finding on this project was that countries having a high death toll did not necessarily mean a high CFR. undefinedBelow is a geographical representation of which countries had the highest number of deaths, with darker shades of red indicating a higher death toll. undefinedHowever, looking at the CFR% tells a different story. undefinedYemen, Sudan, and Syria had top three highest CFR% respectively. When examining based on the income level of the country all three of these countries are considered low income according to the World Bank. undefinedThe maps demonstrate that a high death count does not equate to having a high CFR. Factors contributing to a higher CFR could be the quality of health care and or the availability of resources. As the map above shows, the countries with highest CFR appear to be in a generalized area being that of the Middle East and Africa. There are a few locations with a higher CFR in Central and South America. It is likely that people in higher income countries have better access to treatments and resources. Countries with a higher income status most likely also have a population with better overall health thereby decreasing the chance of serious disease and death.

The key takeaway is that a high death count is not indicative of a high CFR. Factors such as population, population per square mile, and culture could be contributors as to why some countries have a higher number of people exposed to the virus therefore increasing the chances of having a higher death rate. Findings also showed that countries with a high infection rates also had the highest number of people who recovered from the virus.

Future research:

The goal of this project was to gain some insight on the worldly impacts of COVID-19. By comparing the locations of where the most deaths occurred and where the CFR was the highest, this demonstrates that there are probably a number of factors that may contribute to the severity of the disease. Future research could look at the percentage of the population of each country that was affected. Factors such as climate and or geographical layout of the country could also be considered to see if there are any possible trends as far as how the virus impacts a particular location.

Conclusion:

By examining the locations of where there are a higher number of deaths, current active cases, and high CFR rates, the insights derived from the data can provide valuable information to government, health, and other agencies that are responsible for helping to control the spread of the virus. With continued research and improvements on the analysis process such individuals will be able to make more informed choices that can potentially save more lives.

Discussion and feedback(0 comments)
2000 characters remaining