__STYLES__
The dataset contains information about passengers who were aboard the Titanic, including details such as passenger identifiers, survival status, ticket class, names, sex, age, number of siblings/spouses aboard, number of parents/children aboard, ticket number, fare, cabin number, and port of embarkation. The purpose of this report is to provide a preliminary overview of the dataset and highlight initial insights without delving into deep analysis.
The Titanic dataset is divided into three parts:
gender_submission
: Contains the predicted survival of passengers based on their gender.
test
: Contains passenger information without survival information, used for making predictions.
train
: Contains passenger information along with their survival status, used for training models.
Structure and Contents:
gender_submission
: 2 columns - PassengerId, Survived
test
: 11 columns - PassengerId, Pclass, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, Embarked
train
: 12 columns - PassengerId, Survived, Pclass, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, Embarked
gender_submission
Basic statistics:
Total entries: 418
No missing values.
Survived mean: 0.36 which means the survival rate is 36%
outliers? No
test
Basic statistics:
Total entries: 418
Missing values: Three column had missing values. The Age column had eighty six (86) missing values, Fare column had just one (1) missing value, and finally, the Cabin column had three hundred and twenty seven(327) missing values .
train
:
Basic statistics:
Age
column with one hundred and seventy-seven(177) missing values, the Cabin
column with the highest number of missing values with six hundred and eighty seven(687) and finally the Embarked
column with two(2).From the initial data exploration, several key observations can be made:
gender_submission
file suggests that survival prediction based on gender shows a survival rate of 36%.Age
and Cabin
columns across both test
and train
.The Titanic dataset provides a comprehensive snapshot of the passengers aboard the Titanic. Initial observations reveal a survival rate of 36% based on gender, a majority of passengers in the 3rd class, and significant missing data in the Age
and Cabin
columns. These insights suggest areas for further analysis, such as the relationship between survival and passenger class, gender, age, and fare. and also addressing the missing data.
This project was the first stage of HNG internship.
https://hng.tech/internship
https://hng.tech/hire