__STYLES__
When I work with data, specially with data stored in Excel, I have to deal with not normalized data that require a lot of cleansing and some operations before they can be analyzed.
Even if I work with Excel, I use to clean, standardize and normalize the data to build a data model and analyze them.
If I have the possibility, I prefer to load the data and perform these operations in a relational database such as MySQL.
In this project I have a not normalized table in a MySQL database. I have to shape the data to create a normalized database. To do that I have to :
As a first step I perform a "select *" to analyze the table, then some operations to identify the possible normalized table. As the operation shows that for the selected columns I have the same number of distinct values, I can proceed to create a new table in a new schema.
I do the same to spot the second table, the "Store Table". As there's no more dimension table to create, after the Store Table I create the Inventory Table (Fact Table)
After the creation of the new normalized tables I provide to fill them. As the normalized tables have the same column name of non normalized tables, I can fill them with a simple combination of INSERT INTO and SELECT DISTINCT.
The process is the same for all the tables. After the insert I perform:
"No Action" can be considered the default value for these constraints, the one proposed if you add an index / constraint from the GUI. There's others value like CASCADE and SET NULL for more specific usage and maintenance.
Database normalization optimizes structure for integrity across normal forms, minimizing redundancy. Also it identifies functional dependencies, reducing redundant data and maintaining integrity.
Note: the dataset used for the project was part of the SQL Course by Maven Analytics.