__STYLES__
Key Insights
The United States has the highest number of cases and deaths, followed by India, Brazil, and several European countries.
The top 10 countries with the most cases are the USA, India, France, Germany, Brazil, Japan, South Korea, Italy, the UK, and Russia. The top 10 countries with the most deaths are the USA, Brazil, India, Russia, Mexico, Peru, the UK, Italy, Germany, and France. These charts provide an overview of the countries most affected by the pandemic.
Yemen has the highest case fatality rate (CFR), followed by Peru, Mexico, Syria, and Brazil. It's important to consider various factors that can influence the CFR, such as the age distribution of the population and healthcare capacity.
The geographical distribution of deaths shows variations in different regions. The United States is an anomaly in that it presents very high deaths compared to its neighbours, with only Brazil coming close. In Europe, Scandinavia and most of Eastern Europe (excluding the UK) have come out relatively unscathed, while Germany, France, and the UK have taken a big hit, as well as Russia. In Asia, there are pockets of high cases, such as in India. In Africa, South Africa fared the worst, with the rest of Africa performing better.
The analytical process for this project involved several steps:
read.csv()
function was used to read the data from a CSV file.na.omit()
function, and checking and modifying column names.The following key skills were used in this project, which can be beneficial to any business from their analysts:
1. Data Import and Manipulation: The read.csv()
the function was used to import the dataset, and the manipulation of column types and cleaning of missing values was performed using functions such as as.integer()
, na.omit()
, and gsub()
.2
2. Data Visualization: The ggplot2
package was used to create various visualizations, including bar charts and a geographical plot. The geom_bar()
and geom_polygon()
functions were used to create the bar charts and the geographical plot, respectively.
3. Data Aggregation and Summary: The dplyr
package was utilized for data aggregation and summary operations. Functions such as group_by()
, summarise()
, and arrange()
were used to calculate total cases and deaths, aggregate data by country, and sort data.
4. Data Analysis and Calculation: The dplyr
package was also used to perform calculations such as calculating active cases and calculating the CFR for each country.
5. Markdown Reporting: The project documentation was written using R Markdown, allowing for the inclusion of code, visualizations, and descriptions in a single document.
These skills can help businesses in analyzing and interpreting data, identifying key insights, and presenting findings in a clear and concise manner.