### Informal normalization guidelines

In order to have a good relational database schema, all relations in the schema should be normalized. Relational schemas that were generated from an (E)ER schema will automatically have this propety, if the translation rules were applied correctly. On the contrary, if the relational schema was not derived from an (E)ER schema, a formal normalization mechanism can be applied to transform the relational schema to a normalized form.

**Advantages :**

- At the
**logical level**, the users can clearly understand the meaning of the data and formulate correct queries. - At
**the implementation level**, storage space is used efficiently and the risk of inconsistent updates is reduced.

Important to design the relational scheme in such a way that it is easy to explain its meaning. Next, attribute types from multiple entity types should not be combined in a single relation. It should also be make sure to mitigate the risk of insertion, delete and update anomalies when designing the relational scheme. Finally, excessive amounts of NULL values in a relationship should be avoided.

**Functional dependency X -> Y**, between two sets of attribute types X and Y implies that a value of X uniquely determines a value of Y. if X -> Y in R, this does not say whether or not Y->X in R.

__Examples__

Ø SSN → Employee NAME (Note: not necessarily other way around!). Employee name is functionally dependent from SSN. In other words, the SSN uniquely determines the employee name. The other way round isn’t necessarily true since multiple employees can share the same name.

Ø PNUMBER → {PNAME, PLOCATION}. Project number uniquely determines a project name and project location. Therefore, PNAME and PLOCATION are functional dependent from PNUMBER.

Ø {SSN, PNUMBER} → HOURS

If X is a candidate key or F, this implies that X -> Y for any subset of attribute types Y or R.

**Prime attribute type **is an attribute type that is part of a candidate key.

**Normalization **of a relational scheme is a process of analyzing the given relation schemas based on their functional dependencies and candidate keys to **minimize redundancy **and **insert**, **deletion **and **anomalies**.

Unsatisfactory relation schemas that do not meet the normal form tests are decomposed into smaller relation schemas.

**1.**** ** **First normal form(1NF)**

The first normal form states that every attribute type of a relation must be **atomic **and **single valued**. This means there cannot be any composite or multivalued attribute types. In case of a multivalued attribute type, remove it and put it in a separate relation, along with the primary key of the original relation as a foreign key. The **primary ****key **of the new relation is than the **combination **of the **attribute ****type and the primary key **of the original relation. Composite attribute types need to be decomposed in their parts.

__Ex__: The relation is not in first normal form since DLOCATION is a multivalued attribute type. Therefore, we should remove DLOCATION from the relation DEPARTMENT and put it together with DNUMBER as a foreign key. The primary key is then the combination of both: DNUMBER and DLOCATION._{ }

_{ }

** Assumptions**:

A department can have multiple locations! Multiple departments are possible at a given location!

**2.** **Second normal form (2NF)**

· A functional dependency X → Y is a ** full functional dependency **if removal of any attribute type A from X means that the dependency does not hold anymore.

__Example__: SSN, PNUMBER → HOURS; indeed, to know the number of hours an employee worked on a project, we need both the SSN and the project number. PNUMBER → PNAME

· A functional dependency X → Y is a ** partial dependency **if some attribute type A ∈ X can be removed from X and the dependency still holds.

__Example__: SSN, PNUMBER → PNAME. It only depends on PNUMBER and not on SSN

A relation R is in **2NF **if it satisfies 1NF and every nonprime attribute type A in R is fully functional dependent on any key of R. As a reminder, a prime attribute type is an attribute type that is part of a candidate key. If this is not the case, **decompose **it and set up a new relation for each partial key with its dependent attribute types.

Furthermore, it is also important to make sure to keep a relation with the original primary key and any attribute types that are fully functional dependent on it.

__Ex__: R1 is in 1NF since there are no multivalued and no composite attribute types. However, it is not in 2NF. The attribute type PNAME is not fully functional dependent on the primary key. Indeed, it only depends on PNUMBER. On the contrary, the attribute type HOURS is fully dependent from SSN and PNUMBER. Therefore, we need to create a new relation.

**3.** **Third normal form (3NF)**

A functional dependency X → Y in a relation R is a **transitive ****dependency **if there is a set of attribute types Z that is neither a candidate key nor a subset of any key of R, and both X → Z and Z → Y hold.

A relation is in **3NF **if it satisfies 2NF and ** no **nonprime attribute type of R is transitively dependent on the primary key. If this is not the case, decompose and set up a relation that includes the non-key attribute types that functionally determine other non-key attribute types.

The relation R1 contains information about employees and departments. The SSN attribute type is the primary key of the relation. We have 2 transitive dependencies in R1. DNAME is transitively dependent from SSN via DNUMBER. In other words, DNUMBER is functionally dependent from SSN and DMGRSSN is functionally dependent from DNUMBER. Likewise, the DMGRSSN is transitively dependent from SSN via DNUMBER. To bring in 3NF we remove the attribute type DNAME and DMGRSSN and put it in a new relation R12 with DNUMBER as primary key.

R11 can be called EMPLOYEE and R12 DEPARTMENT.

**4.** **Boyce-Codd normal form**

A functional dependency X → Y is called **trivial **if Y is a subset of X __Ex__: SSN, NAME → SSN. A relation R is in **Boyce-Codd **normal form if and only if, for every one of its non-trivial functional dependencies X → Y, X is a superkey; that is, X is either a candidate key or a superset thereof.

Boyce-Codd normal form is said to be stricter than 3 NF. Hence, every relation in Boyce-Codd normal form is in 3NF. Hence, a relation in 3 NF is not necessarily in BCNF

**5.** **The fourth normal form**

There is a **multi-valued** **dependency** from X to Y X →→ Y if and only if each X value exactly determines a set of Y values, independently of the other attribute types . A relation is in 4NF if it is in Boyce-Codd normal form and for every one of its non-trivial multivalued dependencies X →→ Y, X is a superkey—that is, X is either a candidate key or a superset thereof This normal form is often neglected in database modeling.