Informal normalization guidelines
In order to have a good relational database schema, all relations in the schema should be normalized. Relational schemas that were generated from an (E)ER schema will automatically have this propety, if the translation rules were applied correctly. On the contrary, if the relational schema was not derived from an (E)ER schema, a formal normalization mechanism can be applied to transform the relational schema to a normalized form.
Advantages :
- At the logical level, the users can clearly understand the meaning of the data and formulate correct queries.
- At the implementation level, storage space is used efficiently and the risk of inconsistent updates is reduced.
Important to design the relational scheme in such a way that it is easy to explain its meaning. Next, attribute types from multiple entity types should not be combined in a single relation. It should also be make sure to mitigate the risk of insertion, delete and update anomalies when designing the relational scheme. Finally, excessive amounts of NULL values in a relationship should be avoided.
- Functional dependency X -> Y, between two sets of attribute types X and Y implies that a value of X uniquely determines a value of Y. if X -> Y in R, this does not say whether or not Y->X in R.
Examples
Ø SSN → Employee NAME (Note: not necessarily other way around!). Employee name is functionally dependent from SSN. In other words, the SSN uniquely determines the employee name. The other way round isn’t necessarily true since multiple employees can share the same name.
Ø PNUMBER → {PNAME, PLOCATION}. Project number uniquely determines a project name and project location. Therefore, PNAME and PLOCATION are functional dependent from PNUMBER.
Ø {SSN, PNUMBER} → HOURS
If X is a candidate key or F, this implies that X -> Y for any subset of attribute types Y or R.
Prime attribute type is an attribute type that is part of a candidate key.
Normalization of a relational scheme is a process of analyzing the given relation schemas based on their functional dependencies and candidate keys to minimize redundancy and insert, deletion and anomalies.
Unsatisfactory relation schemas that do not meet the normal form tests are decomposed into smaller relation schemas.
1. First normal form(1NF)
The first normal form states that every attribute type of a relation must be atomic and single valued. This means there cannot be any composite or multivalued attribute types. In case of a multivalued attribute type, remove it and put it in a separate relation, along with the primary key of the original relation as a foreign key. The primary key of the new relation is than the combination of the attribute type and the primary key of the original relation. Composite attribute types need to be decomposed in their parts.
Ex: The relation is not in first normal form since DLOCATION is a multivalued attribute type. Therefore, we should remove DLOCATION from the relation DEPARTMENT and put it together with DNUMBER as a foreign key. The primary key is then the combination of both: DNUMBER and DLOCATION.
Assumptions:
A department can have multiple locations! Multiple departments are possible at a given location!
2. Second normal form (2NF)
· A functional dependency X → Y is a full functional dependency if removal of any attribute type A from X means that the dependency does not hold anymore.
Example: SSN, PNUMBER → HOURS; indeed, to know the number of hours an employee worked on a project, we need both the SSN and the project number. PNUMBER → PNAME
· A functional dependency X → Y is a partial dependency if some attribute type A ∈ X can be removed from X and the dependency still holds.
Example: SSN, PNUMBER → PNAME. It only depends on PNUMBER and not on SSN
A relation R is in 2NF if it satisfies 1NF and every nonprime attribute type A in R is fully functional dependent on any key of R. As a reminder, a prime attribute type is an attribute type that is part of a candidate key. If this is not the case, decompose it and set up a new relation for each partial key with its dependent attribute types.
Furthermore, it is also important to make sure to keep a relation with the original primary key and any attribute types that are fully functional dependent on it.
Ex: R1 is in 1NF since there are no multivalued and no composite attribute types. However, it is not in 2NF. The attribute type PNAME is not fully functional dependent on the primary key. Indeed, it only depends on PNUMBER. On the contrary, the attribute type HOURS is fully dependent from SSN and PNUMBER. Therefore, we need to create a new relation.
3. Third normal form (3NF)
A functional dependency X → Y in a relation R is a transitive dependency if there is a set of attribute types Z that is neither a candidate key nor a subset of any key of R, and both X → Z and Z → Y hold.
A relation is in 3NF if it satisfies 2NF and no nonprime attribute type of R is transitively dependent on the primary key. If this is not the case, decompose and set up a relation that includes the non-key attribute types that functionally determine other non-key attribute types.
The relation R1 contains information about employees and departments. The SSN attribute type is the primary key of the relation. We have 2 transitive dependencies in R1. DNAME is transitively dependent from SSN via DNUMBER. In other words, DNUMBER is functionally dependent from SSN and DMGRSSN is functionally dependent from DNUMBER. Likewise, the DMGRSSN is transitively dependent from SSN via DNUMBER. To bring in 3NF we remove the attribute type DNAME and DMGRSSN and put it in a new relation R12 with DNUMBER as primary key.
R11 can be called EMPLOYEE and R12 DEPARTMENT.
4. Boyce-Codd normal form
A functional dependency X → Y is called trivial if Y is a subset of X Ex: SSN, NAME → SSN. A relation R is in Boyce-Codd normal form if and only if, for every one of its non-trivial functional dependencies X → Y, X is a superkey; that is, X is either a candidate key or a superset thereof.
Boyce-Codd normal form is said to be stricter than 3 NF. Hence, every relation in Boyce-Codd normal form is in 3NF. Hence, a relation in 3 NF is not necessarily in BCNF
5. The fourth normal form
There is a multi-valued dependency from X to Y X →→ Y if and only if each X value exactly determines a set of Y values, independently of the other attribute types . A relation is in 4NF if it is in Boyce-Codd normal form and for every one of its non-trivial multivalued dependencies X →→ Y, X is a superkey—that is, X is either a candidate key or a superset thereof This normal form is often neglected in database modeling.