__STYLES__
Tools used in this project
FIFA 21 Messy Data Analysis

About this project

Dataset Description

  • Dataset name: FIFA 21
  • Data source: Kaggle.com
  • File size: 8.04mb

A link to the FIFA 21 messy raw dataset can be found here. Note: The dataset has approximately 18979 rows and 77 columns and also includes a data dictionary to give you insight into your cleaning process.

Data Cleaning Process

Pay Attention: Before every data cleaning process, be sure to make a duplicate of the dataset. This is to maintain originality and also serves as backup just in case you may need it again.

1. Cleaning the ID, Name, Long Name,FIFA 21 and Club column: After successfully loading data into my Excel worksheet, I noticed the name, long name, and club column had special character’s. The name column contains the players short name while the long name column contains the players full name. I’m guessing you already know the contents of the club column. I was able to clean these columns using power query. Once I loaded the data in UTF-8 type on the query editor it automatically eliminated all the special characters and afterwards I closed the editor and loaded the data on my existing worksheet. Voila! All done.

before

after

2. Cleaning the OVA, POT, BOV columns: The Overall Players’ Analysis (OVA) and Player Potential (POT) and Best Overall (BOV) columns were converted to percentages as written in the data dictionary. A custom column was used to multiply the value by 0.01 (100) to record the answer in percentage (%).

OVA,POV,BOV Columns without %before

after

3. Cleaning the Contract, Joined, and Loan Date End column: While filtering the contract column, I realised it contained different values such as 2004~2021, Free, and Jun 30, 2021, On Loan. Also, the loan date end column had the same values as some of the filtered values in the contract column, so I went on to split the contract column to eliminate the repeated values (Jun 30, 2021) and also give a better analysis, as you can see below. Did you just look below? Yeah, you did.

before

after

  1. Cleaning the Height and Weight column: The height column had values in both cm and ft. inches, while the weight column had values in kg and lbs. to ensure consistency in both columns height and weight, I filtered the values in cm and converted them to ft. inch’s and converted the values in kg to lbs respectively.

before

after

5. Cleaning the value, wage, and release clause columns: Upon loading the data into my worksheet, the columns had data type issues and inconsistent values. The values have suffixes M and K, where M is millions and ‘K’ is thousands. In order to normalise the values, data type, and current currency type, I used the IF function to eliminate the suffixes M and K, where values that end with M are multiplied by a million and values that end in K are multiplied by a thousand.

before

after

6. Cleaning the W/F, SM, IR, and Hits columns: The following columns contain ratings on a scale of 1–5 with a special character in front of each value. The find and replace tool was used to eliminate the special characters. While filtering the Hits column, I found out some values had K meaning thousand in them, so I used the If function to eliminate the K and multiply the value by a thousand.

before

after

Conclusion: Cleaning up the data was a bit of a challenge, but you know what they say—practice makes perfect. It's not the easiest thing, but hey, we're getting the hang of it! You can find the clean data here. Thank you for reading.

Discussion and feedback(0 comments)
2000 characters remaining