Database Normalization

How to normalize databases several layers deep



What is normalization? It’s a simple method of breaking up tables into smaller tables linking relationships to reduce data redundancy and eliminate insertion, update and deletion anomalies.

There are a few keywords that we should gloss over first before we begin.
‘Non-Atomic’ is the word used to describe a data item that can be broken down further.
Examples of non-atomic data
A full name: John Hunt when a first name and surname is present in the database.
A full address: 6 Picton Road, London, WR1 4PG.
Data that is part of a larger dataset e.g Oxford Book club and Coventry Health club.
Primary Key — Field in a table that uniquely identifies each record in the table, cannot be null and should be rarely changed.
Foreign Key — Linking a table Primary Key to another table.


Let’s get into the several layers of normalisation, starting with the First Normal Form.
We will be using example tables on the left and the normalized tables on the right.

Note the Camel Case Column Titles as per naming conventions.

On the table to the right we’ve made sure, there are no repeating columns, non-atomic attributes are converted to atomic and that there are primary keys.
Note non-atomic data in the empMobile column and how it is broken up.

There will be an update a deletion anomaly for the repeating employees, in order to delete that employee you would have to update both records.

To move to the Second normal form we need to make sure each field is dependent on a whole primary key and there are no partial dependencies.

Partial dependencies can only exist if there is a composite key.

Composite Key refers to cases where more than one column is used to specify the primary key of a table.

A super key with no redundant attribute is known as a candidate key. A super key is a set of one or more attributes (columns)

For the third normal form, the table on the right empState, empCity & empDistrict is dependent on empZip. empZip is dependent on empId that makes non-prime attributes such as empState, empCity & empDistrict transitively dependent on the super key empId. This violates the rule of 3NF.

So we need to make sure that attributes depend on a primary key,

We need to remove the transitive functional dependency of non-prime attributes on any super key.

A functional dependency is said to be transitive if it is indirectly formed by two functional dependencies.
Boyce Codd Normal Form is also known as 3.5 Normal Form.

empId has a Functional dependency empNationality and empDept has functional dependencies deptType and deptNoOfEmp.

The third table on the right is an example of a linking table. This table uses a many-to-many relationship to the other two tables as an entry point when accessing both their records at the same time.
If no database table instance contains two or more, independent and multivalued data describing the relevant entity, then it is in the 4th Normal Form.

It means in this case we need to break up the shooting location column and listing column to convert to the fourth normal form
The fifth normal form is sometimes referred to as a project-join normal form.

To move to the fifth normal form we simply break up the table into as many smaller tables as possible, this is to try to avoid redundancies.
The sixth normal form has yet to be completely defined however essentially it is breaking up tables even further if possible.


To Summarize

Normalization helps produce database systems that have faster data access times, and have better security models.

Functional dependencies are a very important component of the normalised data process.

Most database systems are normalized up to the third normal form.

A primary key uniquely identifies are record in a Table and cannot be null.

A foreign key helps connect tables and references a primary key.

Comments