Sunday, 22 November 2015

Forms of Normalization. NF1,NF2,NF3. Detailed.

Database normalization:

Normalization involves decomposing a table into less redundant (and smaller) tables without losing information; defining foreign keys in the old table referencing the primary keys of the new ones.
                       The objective is to isolate data so that additions, deletions, and modifications of an attribute can be made in just one table and then propagated through the rest of the database using the defined foreign keys.

Problem Without Normalization
S_id
S_Name
S_Address
Subject_opted
401
Adam
Noida
Bio
402
Alex
Panipat
Maths
403
Stuart
Jammu
Maths
404
Adam
Noida
Physics
Without Normalization, it becomes difficult to handle and update the database, without facing data loss. Insertion, Updation and Deletion Anamolies are very frequent if Database is not Normalized. To understand these anomalies let us take an example of Student table.

   Updation Anamoly : To update address of a student who occurs twice or more than twice in a table, we will have to update S_Address column in all the rows, else data will become inconsistent.
   Insertion Anamoly : Suppose for a new admission, we have a Student id(S_id), name and address of a student but if student has not opted for any subjects yet then we have to insert NULL there, leading to Insertion Anamoly.
   Deletion Anamoly : If (S_id) 401 has only one subject and temporarily he drops it, when we delete that row, entire student record will be deleted along with it.


Edgar F. Codd, the inventor of the relational model (RM), introduced the concept of normalization and what we now know as the First normal form (1NF) in 1970. Codd went on to define the Second normal form (2NF) and Third normal form (3NF) in 1971, and Codd and Raymond F. Boyce defined the Boyce-Codd Normal Form (BCNF) in 1974.[3] Informally, a relational database table is often described as "normalized" if it


First normal form (1NF) is a property of a relation in a relational database. A relation is in first normal form if the domain of each attribute contains only atomic values, and the value of each attribute contains only a single value from that domain.
Edgar Codd, in a 1971 conference paper, defined a relation in first normal form to be one such that none of the domains of that relation should have elements which are themselves sets.
First normal form enforces these criteria:
   Eliminate repeating groups in individual tables.
   Create a separate table for each set of related data.
   Identify each set of related data with a primary key


Student Table :
Student
Age
Subject
Adam
15
Biology, Maths
Alex
14
Maths
Stuart
17
Maths
In First Normal Form, any row must not have a column in which more than one value is saved, like separated with commas. Rather than that, we must separate such data into multiple rows.


Student Table following 1NF will be :

Student
Age
Subject
Adam
15
Biology
Adam
15
Maths
Alex
14
Maths
Stuart
17
Maths
Using the First Normal Form, data redundancy increases, as there will be many columns with same data in multiple rows but each row as a whole will be unique.


Second normal form:
A table that is in first normal form (1NF) must meet additional criteria if it is to qualify for second normal form.
Specifically: a table is in 2NF if it is in 1NF and no non-prime attribute is dependent on any proper subset of any candidate key of the table. A non-prime attribute of a table is an attribute that is not a part of any candidate key of the table.
Put simply, a table is in 2NF if it is in 1NF and every non-prime attribute of the table is dependent on the whole of every candidate key.
Tournament Winners
Tournament
Year
Winner
Winner Date of Birth
Des Moines Masters
1998
Chip Masterson
14 March 1977
Indiana Invitational
1998
Al Fredrickson
21 July 1975
Cleveland Open
1999
Bob Albertson
28 September 1968
Des Moines Masters
1999
Al Fredrickson
21 July 1975
Indiana Invitational
1999
Chip Masterson
14 March 1977

Even though Winner and Winner Date of Birth are determined by the whole key {Tournament, Year} and not part of it, particular Winner / Winner Date of Birth combinations are shown redundantly on multiple records. This leads to an update anomaly: if updates are not carried out consistently, a particular winner could be shown as having two different dates of birth.
KEY POINT: THE TABLE ABOVE IS IN 2ND NORMAL FORM AND IT STILL HAS UPDATE ANOMALY.
The underlying problem is the transitive dependency to which the Winner Date of Birth attribute is subject. Winner Date of Birth actually depends on Winner, which in turn depends on the key Tournament / Year.
This problem of transitive dependency is addressed by third normal form (3NF).

Third normal form:
Third normal form is a normal form used in normalizing a database design to reduce the duplication of data and ensure referential integrity by ensuring that
   the entity is in second normal form and
   all the attributes in a table are determined only by the candidate keys of that table and not by any non-prime attributes.

An example of a 2NF table that fails to meet the requirements of 3NF is:
Tournament Winners
Tournament
Year
Winner
Winner Date of Birth
Indiana Invitational
1998
Al Fredrickson
21 July 1975
Cleveland Open
1999
Bob Albertson
28 September 1968
Des Moines Masters
1999
Al Fredrickson
21 July 1975
Indiana Invitational
1999
Chip Masterson
14 March 1977


Because each row in the table needs to tell us who won a particular Tournament in a particular Year, the composite key {Tournament, Year} is a minimal set of attributes guaranteed to uniquely identify a row. That is, {Tournament, Year} is a candidate key for the table.
The breach of 3NF occurs because the non-prime attribute Winner Date of Birth is transitively dependent on the candidate key {Tournament, Year} via the non-prime attribute Winner. The fact that Winner Date of Birth is functionally dependent on Winner makes the table vulnerable to logical inconsistencies, as there is nothing to stop the same person from being shown with different dates of birth on different records.

In order to express the same facts without violating 3NF, it is necessary to split the table into two:

1)   The Tournament winners { tournament, year , winner }
2)   Winner dates of birth { winner, date of birth }

Tournament Winners
Tournament
Year
Winner
Indiana Invitational
1998
Al Fredrickson
Cleveland Open
1999
Bob Albertson
Des Moines Masters
1999
Al Fredrickson
Indiana Invitational
1999
Chip Masterson

Dates of Birth
Winner
Date of Birth
Chip Masterson
14 March 1977
Al Fredrickson
21 July 1975
Bob Albertson
28 September 1968


Note: Tables used in the examples have been taken from various open sources. 

1 comment:

  1. CasinoTacos Review 2021 – Get an Exclusive Welcome Bonus
    CasinoTacos is a licensed online 바카라 시스템 배팅 casino where you can enjoy the 무료 룰렛 게임 best slots and games from the biggest software 암호 화폐 종류 providers in the 해외 축구 스코어 world. 토토 커뮤니티

    ReplyDelete