Database
normalization:
Normalization
involves decomposing a table into less redundant (and smaller) tables without
losing information; defining
foreign keys
in the old table referencing the primary
keys of
the new ones.
The objective is to
isolate data so that additions, deletions, and modifications of an attribute
can be made in just one table and then propagated through the rest of the
database using the defined foreign keys.
Problem Without Normalization
S_id
|
S_Name
|
S_Address
|
Subject_opted
|
401
|
Adam
|
Noida
|
Bio
|
402
|
Alex
|
Panipat
|
Maths
|
403
|
Stuart
|
Jammu
|
Maths
|
404
|
Adam
|
Noida
|
Physics
|
Without Normalization, it becomes difficult to handle and
update the database, without facing data loss. Insertion, Updation and Deletion
Anamolies are very frequent if Database is not Normalized. To understand these
anomalies let us take an example of Student table.
▪
Updation Anamoly : To update address of a student who occurs twice or more than twice in a
table, we will have to update S_Address column in all the rows, else
data will become inconsistent.
▪
Insertion Anamoly : Suppose for a new admission, we have a Student id(S_id), name and
address of a student but if student has not opted for any subjects yet then we
have to insert NULL there, leading to Insertion Anamoly.
▪
Deletion Anamoly : If (S_id) 401 has only one subject and temporarily he drops it, when we
delete that row, entire student record will be deleted along with it.
Edgar
F. Codd, the inventor of the relational
model (RM),
introduced the concept of normalization and what we now know as the First normal form
(1NF) in 1970.
Codd went on to define the Second normal form (2NF) and Third normal form
(3NF) in 1971,
and Codd and
Raymond F. Boyce
defined the Boyce-Codd Normal Form (BCNF) in 1974.[3]
Informally, a relational database table is often described as
"normalized" if it
First normal form (1NF) is a property of a relation
in a relational database. A relation is in first normal form if the domain of each attribute contains only atomic values, and the value of each attribute
contains only a single value from that domain.
Edgar Codd,
in a 1971 conference paper, defined a relation in first normal form to be one
such that none of the domains of that relation should have elements which
are themselves sets.
First normal form enforces
these criteria:
▪
Eliminate
repeating groups in individual tables.
▪
Create
a separate table for each set of related data.
▪
Identify
each set of related data with a primary key
Student Table :
Student
|
Age
|
Subject
|
Adam
|
15
|
Biology, Maths
|
Alex
|
14
|
Maths
|
Stuart
|
17
|
Maths
|
In First Normal Form, any row must not have a column in which
more than one value is saved, like separated with commas. Rather than that, we
must separate such data into multiple rows.
Student Table following 1NF will be :
Student
|
Age
|
Subject
|
Adam
|
15
|
Biology
|
Adam
|
15
|
Maths
|
Alex
|
14
|
Maths
|
Stuart
|
17
|
Maths
|
Using the First Normal
Form, data redundancy increases, as there will be many columns with same
data in multiple rows but each row as a whole will be unique.
Second normal form:
A table that is in first normal form
(1NF) must meet additional criteria if it is to qualify for second normal form.
Specifically: a table is in 2NF
if it is in 1NF and no non-prime
attribute is dependent on any proper subset of any candidate
key of the table. A
non-prime attribute of a table is an attribute that is not a part of any
candidate key of the table.
Put simply, a table is in 2NF
if it is in 1NF and every non-prime attribute of the table is dependent on the
whole of every candidate key.
Tournament
Winners
Tournament
|
Year
|
Winner
|
Winner
Date of Birth
|
Des
Moines Masters
|
1998
|
Chip
Masterson
|
14
March 1977
|
Indiana
Invitational
|
1998
|
Al
Fredrickson
|
21
July 1975
|
Cleveland
Open
|
1999
|
Bob
Albertson
|
28
September 1968
|
Des
Moines Masters
|
1999
|
Al
Fredrickson
|
21
July 1975
|
Indiana
Invitational
|
1999
|
Chip
Masterson
|
14
March 1977
|
Even though Winner and Winner
Date of Birth are determined by the whole key {Tournament, Year} and not
part of it, particular Winner / Winner Date of Birth combinations are shown
redundantly on multiple records. This leads to an update anomaly: if
updates are not carried out consistently, a particular winner could be shown as
having two different dates of birth.
KEY POINT: THE TABLE ABOVE IS
IN 2ND NORMAL FORM AND IT STILL HAS UPDATE ANOMALY.
The underlying problem is the transitive
dependency to which the Winner Date of Birth attribute is subject. Winner
Date of Birth actually depends on Winner, which in turn depends on the key
Tournament / Year.
Third normal form:
Third normal form is a normal form
used in normalizing
a database
design to reduce the duplication of data and ensure referential integrity by ensuring that
▪
all
the attributes in a table are determined only by the candidate
keys of that table and not
by any non-prime attributes.
An example of a 2NF table that
fails to meet the requirements of 3NF is:
Tournament
Winners
Tournament
|
Year
|
Winner
|
Winner
Date of Birth
|
Indiana
Invitational
|
1998
|
Al
Fredrickson
|
21
July 1975
|
Cleveland
Open
|
1999
|
Bob
Albertson
|
28
September 1968
|
Des
Moines Masters
|
1999
|
Al
Fredrickson
|
21
July 1975
|
Indiana
Invitational
|
1999
|
Chip
Masterson
|
14
March 1977
|
Because each row in the table
needs to tell us who won a particular Tournament in a particular Year, the
composite key {Tournament, Year} is a minimal set of attributes guaranteed to
uniquely identify a row. That is, {Tournament, Year} is a candidate key for the
table.
The breach of 3NF occurs because the
non-prime attribute Winner Date of Birth is transitively dependent on the
candidate key {Tournament, Year} via the non-prime attribute Winner. The fact
that Winner Date of Birth is functionally dependent on Winner makes the table
vulnerable to logical inconsistencies, as there is nothing to stop the same
person from being shown with different dates of birth on different records.
In
order to express the same facts without violating 3NF, it is necessary to split
the table into two:
1)
The Tournament winners { tournament, year ,
winner }
2)
Winner dates of birth { winner, date of birth }
Tournament
Winners
Tournament
|
Year
|
Winner
|
Indiana
Invitational
|
1998
|
Al
Fredrickson
|
Cleveland
Open
|
1999
|
Bob
Albertson
|
Des
Moines Masters
|
1999
|
Al
Fredrickson
|
Indiana
Invitational
|
1999
|
Chip
Masterson
|
Dates
of Birth
Winner
|
Date of Birth
|
Chip
Masterson
|
14
March 1977
|
Al
Fredrickson
|
21
July 1975
|
Bob
Albertson
|
28
September 1968
|
CasinoTacos Review 2021 – Get an Exclusive Welcome Bonus
ReplyDeleteCasinoTacos is a licensed online 바카라 시스템 배팅 casino where you can enjoy the 무료 룰렛 게임 best slots and games from the biggest software 암호 화폐 종류 providers in the 해외 축구 스코어 world. 토토 커뮤니티