0% found this document useful (0 votes)
33 views

Unit 3 Normalization

dbms normalization
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Unit 3 Normalization

dbms normalization
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 70

UNIT- III

Basics of Functional Dependencies and


Normalization for Relational Databases:
Functional Dependencies, Normal Forms Based
Primary Keys, General Definitions of Second and
Third Normal Forms, Boyce-Codd Normal Forms,
Multivalued Dependency and Fourth Normal
Form.
Data redundancy
• Storing the same information
redundantly(repeatedly) i.e. storing
the same information in more than
one place or repetition of same data
multiple times within a database is
called as data redundancy
Problems Caused by data
Redundancy
• Storing the same information redundantly, that
is, in more than one place within a database, can
lead to several problems:
• Redundant storage: Some information is
stored repeatedly.
• Update anomalies: If one copy of such
repeated data is updated, an inconsistency
• is created unless all copies are similarly
updated.
• Insertion anomalies: It may not be possible to
store some information unless
• some other information is stored as well.
• Deletion anomalies: It may not be possible
to delete some information without
Schema Refinement, Decomposition
• TO AVOID REDUNDANCY and problems
caused due to redundancy, we use schema
refinement technique called
DECOMPOSITION.
• Decomposition:- Process of decomposing a
larger relation into smaller relations.
• Each of smaller relations contain subset of
attributes of original relation.
Functional Dependency

• A functional dependency X ->Y holds over relation R if, X determines


the attribute Y uniquely, we call it as Y is functionally dependent on X
• Here attribute X is called as determinant and Y attribute is called as
dependent
• A functional dependency A->B in a relation holds true if two tuples
having the same value of attribute A also have the same value of
attribute B
• IF t1.X=t2.X then t1.Y=t2.Y where t1,t2 are tuples and X,Y are
attributes.
• Ex: sid->sname
• Empid->dept
• courseid->coursename
Reasoning about functional dependencies:

• Armstrong Axioms :
• Armstrong axioms defines the set of rules for
reasoning about functional dependencies and
also to infer all the functional dependencies
on a relational database.
Primary armstrong axiom rules:
secondary or derived axioms:
Types of functional dependencies:

1) Trivial functional dependency:-If X->Y is a functional dependency


where Y is a subset of X, these type of FD’s called as trivial
functional dependency.
2) Non-trivial functional dependency:-If X->Y and Y is not a subset
of X then it is called non-trivial functional dependency.
3) Multi valued Functional Dependency:
In Multi valued functional dependency, entities of the dependent set
are not dependent on each other.
i.e. If a → {b, c} and there exists no functional
dependency between b and c, then it is called a multi valued
functional dependency.
4) Transitive Functional Dependency
If a → b & b → c, then according to axiom of transitivity,
a → c. This is a transitive functional dependency
Candidate Key:

• Candidate Key is minimal set of attributes of a


relation which can be used to identify a tuple
uniquely.
• Consider student table:
• student(sno, sname,phone,age)
• we can take sno as candidate key.
Super Key:

• Super Key is set of attributes of a relation


which can be used to identify a tuple uniquely.

• Consider student table:


• student(sno, sname,sphone,age)
• we can take sno, (sno, sname) as super key
Prime and non-prime attributes

• Attributes which are parts of any candidate


key of relation are called as prime attribute,
others are non-prime attributes.
• Consider student table:
• student(sno, sname,phone,age)
• we can take sno as candidate key.
• Here sno is prime attribute and
sname,phno,age are non prime attributes.
Normalization
Normalization
• Normalization removes data redundancy and it will
helps in designing a good data base which involves
a set of normal forms as follows -
1)First normal form(1NF)
2)Second normal form(2NF)
3)Third normal form(3NF)
4)Boyce coded normal form(BCNF)
5)Forth normal form(4NF)
6)Fifth normal form(5NF)
First Normal Form (1NF)

• A relation will be 1NF if it contains an atomic


value.
• It states that an attribute of a table cannot
hold multiple values. It must hold only single-
valued attribute.
• First normal form disallows the multi-valued
attribute, composite attribute, and their
combinations.
First Normal Form (1NF)

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385, UP
9064738238

20 Harry 8574783832 Bihar


First Normal Form (1NF)

EMP_ID EMP_NAM EMP_PHON EMP_STATE


E E

14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
Second Normal Form (2NF)

 A relation must be in first normal form and


relation must not contain any partial
dependency. A relation is in 2NF if it has No
Partial Dependency,
 no non-prime attribute (attributes which are
not part of any candidate key) is dependent on
any proper subset of any candidate key of the
table.
Second Normal Form (2NF)

STUD_NO COURSE_NO COURSE_FEE

1 C1 1000

2 C2 2000

1 C4 3000

4 C3 5000

4 C1 1000

2 C5 6000
Second Normal Form (2NF)

• To convert the above relation to 2NF,


we need to split the table into two tables such
as :
Table 1: STUD_NO, COURSE_NO
Table 2: COURSE_NO, COURSE_FEE
2NF
STUD_NO COURSE_NO
1 C1
2 C2
1 C4
4 C3
4 C1
2 C5
2NF
COURSE_NO COURSE_FEE
C1 1000
C2 2000
C4 3000
C3 5000
C1 1000
C5 6000
Third Normal Form (3NF)

• A relation will be in 3NF if it is in 2NF and not contain any transitive


partial dependency.
• 3NF is used to reduce the data duplication. It is also used to achieve
the data integrity.
• If there is no transitive dependency for non-prime attributes, then the
relation must be in third normal form.
• A relation is in third normal form if it holds atleast one of the
following conditions for every non-trivial function dependency X → Y.

• X is a super key.
• Y is a prime attribute, i.e., each element of Y is part of some candidate
key.
Third Normal Form (3NF)

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY

222 Harry 201010 UP Noida

333 Stephan 02228 US Boston

444 Lan 60007 US Chicago

555 Katharine 06389 UK Norwich

666 John 462007 MP Bhopal


Third Normal Form (3NF)

• Super key in the table above:


• {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZI
P}....so on
• Candidate key: {EMP_ID}
• Non-prime attributes: In the given table, all attributes except
EMP_ID are non-prime.
• Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and
EMP_ZIP dependent on EMP_ID. The non-prime attributes
(EMP_STATE, EMP_CITY) transitively dependent on super
key(EMP_ID). It violates the rule of third normal form.
• That's why we need to move the EMP_CITY and EMP_STATE to
the new <EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
Third Normal Form (3NF)
EMP TABLE

EMP_ID EMP_NAME EMP_ZIP


222 Harry 201010

333 Stephan 02228

444 Lan 60007

555 Katharine 06389

666 John 462007


Third Normal Form (3NF)
EMP_ZIP table

EMP_ZIP EMP_STATE EMP_CITY


201010 UP Noida

02228 US Boston

60007 US Chicago

06389 UK Norwich

462007 MP Bhopal
Boyce Codd normal form
(BCNF,3.5NF)

• BCNF is the advance version of 3NF. It is


stricter than 3NF.
• A table is in BCNF if every functional
dependency X → Y, X is the super key of the
table.
• For BCNF, the table should be in 3NF, and for
every FD, LHS is super key.
Boyce Codd normal form
EMP_ID EMP_COUNT EMP_DEPT DEPT_TYPE EMP_DEPT_
RY NO
264 India Designing D394 283

264 India Testing D394 300

364 UK Stores D283 232

364 UK Developing D283 549


BCNF
• EMP_COUNTRY table:

EMP_ID EMP_COUNTRY

264 India

264 India
Boyce Codd normal form
• In the above table Functional dependencies are as
follows:
• EMP_ID → EMP_COUNTRY
• EMP_DEPT → {DEPT_TYPE, }
• DEPT_TYPE → {EMP_DEPT_NO}
• Candidate key: {EMP-ID, EMP-DEPT}
• The table is not in BCNF because dept_type is not a
key.
• To convert the given table into BCNF, we decompose it
into three tables:
BCNF
• EMP_DEPT table:

EMP_DEPT DEPT_TYPE
Designing D394
Testing D394
Stores D283
Developing D283
BCNF
• EMP_DEPT_MAPPING table:
DEPT_TYPE EMP_DEPT_NO

D394 283

D394 300

D283 232

D283 549
BCNF
• Functional dependencies:
• EMP_ID → EMP_COUNTRY
• EMP_DEPT → DEPT_TYPE
• DEPT_TYPE → EMP_DEPT_NO
• Candidate keys:
• For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {DEPT_TYPE}
BCNF
• Id Subject Professor
• 101 Java Mayank
• 101 C++ Kartik
• 102 Java Sarthak
• 103 C# Lakshay
• 104 Java Mayank
BCNF
in the table:

• One student can enroll in more than one subject.


• Example: student with Id 101 has enrolled in Java
and C++.
• Professor is assigned to the student for a
specified subject, and there is always a possibility
that there can be multiple professors teaching a
particular subject.
BCNF
Finding the solution:
• Using Id and Subject together, we can find all unique records and also the other
columns of the table. Hence, the Id and Subject together form the primary key.
• The table is in 1NF because all the values inside a column are atomic and of the same
domain.
• We can't uniquely identify a record solely with the help of either the Id or the Subject
name. As there is no partial dependency, the table is also in 2NF.
• There is no transitive dependency because the non-prime attribute i.e., Professor, is
not deriving any other non-prime attribute column in the table. Hence, the table is also
in 3NF.
• There is a point to be noted that the table is not in BCNF (Boyce-Codd Normal Form).
Why is the table not in BCNF?
• As we know that each professor teaches only one subject, but one subject may be
taught by multiple professors. This shows that there is a dependency between the
subject & the professor, and the subject is always dependent on the professor
(professor -> subject). As we know that the professor column is a non-prime attribute,
while the subject is a prime attribute. This is not allowed in BCNF in DBMS. For BCNF,
the deriving attribute (professor here) must be a prime attribute.
BCNF
• we will decompose the table into two tables:
the Student table and the Professor table to
satisfy the conditions of BCNF.
Student Table
P_Id S_IdProfessor
1 101 Mayank
2 101 Kartik
3 102 Sarthak
4 103 Lakshay
5 104 Mayank
Professor Table
Professor Subject
Mayank Java
Kartik C++
Sarthak Java
Lakshay C#
Mayank Java

Professor is now the primary key and the prime attribute column,
deriving the subject column. Hence, it is in BCNF.
Fourth normal form (4NF)

• A relation will be in 4NF if it is in Boyce Codd


normal form and has no multi-valued
dependency.
• For a dependency A → B, if for a single value
of A, multiple values of B exists, then the
relation will be a multi-valued dependency.
Fourth normal form (4NF)

STU_ID COURSE HOBBY


21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
Fourth normal form (4NF)

• The given STUDENT table is in 3NF, but the COURSE


and HOBBY are two independent entity. Hence, there is no
relationship between COURSE and HOBBY.
• In the STUDENT relation, a student with
STU_ID, 21 contains two
courses, Computer and Math and two
hobbies, Dancing and Singing.
• So there is a Multi-valued dependency on STU_ID, which
leads to unnecessary repetition of data.
• So to make the above table into 4NF, we can decompose it
into two tables:
Fourth normal form (4NF)

• STUDENT_COURSE

STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
Fourth normal form (4NF)

• STUDENT_HOBBY

STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Fifth normal form (5NF)

• A relation is in 5NF if it is in 4NF and not


contains any join dependency and joining
should be lossless.
• 5NF is satisfied when all the tables are broken
into as many tables as possible in order to
avoid redundancy.
• 5NF is also known as Project-join normal form
(PJ/NF).
Fifth normal form (5NF)

• STUDENT TABLE

SUBJECT LECTURER SEMESTER


Computer Anshika Semester 1
Computer John Semester 1
Math John Semester 1
Math Akash Semester 2
Chemistry Praveen Semester 1
Fifth normal form (5NF)

• In the above table, John takes both Computer and Math


class for Semester 1 but he doesn't take Math class for
Semester 2. In this case, combination of all these fields
required to identify a valid data.
• Suppose we add a new Semester as Semester 3 but do
not know about the subject and who will be taking that
subject so we leave Lecturer and Subject as NULL. But all
three columns together acts as a primary key, so we
can't leave other two columns blank.
• So to make the above table into 5NF, we can decompose
it into three relations P1, P2 & P3:
Fifth normal form (5NF)

• P1 TABLE

SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
Fifth normal form (5NF)

• P2 TABLE

SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
Fifth normal form (5NF)

• P3 TABLE

SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Decomposition
• The process of breaking up or dividing a single
relation into two or more sub relations is
called as decomposition of a relation.
Properties of Decomposition

1. Lossless join decomposition


2. Lossy join decomposition
3. 2. Dependency Preserving decomposition
Lossless Join Decomposition-
Consider there is a relation R which is decomposed into sub relations R 1 ,
R2 , …. , Rn.
• This decomposition is called lossless join decomposition when the join
of the sub relations results in the same relation R that was decomposed.
• For lossless join decomposition, we always have
• R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn = R
• Consider the following relation R( A , B , C )-
• Consider this relation is decomposed into two sub relations R 1( A , B )
and R2( B , C )-
• R1 ⋈ R2 = R
• This relation is same as the original relation R.
• So this relation is lossless join decomposition
Lossy Join Decomposition

• Consider there is a relation R which is


decomposed into sub relations R1 , R2 , …. , Rn.
• This decomposition is called lossy join
decomposition when the join of the sub
relations does not result in the same relation R
that was decomposed.
• R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn ⊃ R
Dependency Preserving decomposition

• In the dependency preservation, at least one decomposed


table must satisfy every dependency.
• If a relation R is decomposed into relation R1 and R2, then
the dependencies of R either must be a part of R1 or R2 or
must be derivable from the combination of functional
dependencies of R1 and R2.
• For example, suppose there is a relation R (A, B, C, D) with
functional dependency set (A->BC). The relational R is
decomposed into R1(ABC) and R2(AD) which is dependency
preserving because FD A->BC is a part of relation R1(ABC).
Lossless join decomposition
• Consider a relation schema R( X Y Z W P ) (above table R) is decomposed into
R1( X Y Z W ) and R2( W P). determine whether the above R1 and R2 are Lossless
or Lossy?
• Solution: For a relation R to be lossless decomposition R should satisfy following
three conditions:
• Attribute(R1) U Attribute (R2) = Attribute (R)
• Attribute (R1) ∩ Attribute (R2) ≠ Φ
• Attribute (R1) ∩ Attribute (R2) -> Attribute (R1) or Attribute (R1) ∩ Attribute (R2) -
> Attribute (R2)
• Cond 1: satisfied as Attribute(R1) U Attribute (R2) = Attribute (R) = (X Y Z W P )
• Cond 2: satisfied as Attribute (R1) ∩ Attribute (R2) ≠ Φ = ( W )
• Cond 3: satisfied as common attribute W key (we can check from table values of
column W is unique)
• Hence relation R (X Y Z W P) decomposed into R1( X Y Z ) and R2( Z W P ) is a
Lossless decomposition.
Lossless join decomposition
• Consider a relation schema R( X Y Z W P ) (above table R) is decomposed
into R1( X Y Z ) and R2( W P), determine whether the above R1 and R2 are
Lossless or Lossy?
• Solution: For a relation R to be lossless decomposition R should satisfy the
following three conditions:
• Attribute(R1) U Attribute (R2) = Attribute (R)
• Attribute (R1) ∩ Attribute (R2) ≠ Φ
• Attribute (R1) ∩ Attribute (R2) -> Attribute (R1) or Attribute (R1) ∩ Attribute
(R2) -> Attribute (R2)
• Cond 1: satisfied as Attribute(R1) U Attribute (R2) = Attribute (R) = (X Y Z W
P)
• Cond 2: Not satisfied as Attribute (R1) ∩ Attribute (R2) = Φ
• Since Condition 2 is not satisfied so we will not check condition 3
• Hence relation R (X Y Z W P) decomposed into R1( X Y Z ) and R2( W P ) is a
Lossy decomposition.
Lossless join decomposition
1.Consider a relation schema R ( A , B , C , D ) with
the functional dependencies A → B and C → D.
Determine whether the decomposition of R
into R1 ( A , B ) and R2 ( C , D ) is lossless or lossy.
2.Consider a relation schema R ( A , B , C , D ) with
the following functional dependencies A → B, B
→ C,C → D, D → B. Determine whether the
decomposition of R into R1 ( A , B ) , R2 ( B , C )
and R3 ( B , D ) is lossless or lossy.
Closure of an Attribute

Closure of an Attribute: Closure of an Attribute


can be defined as a set of attributes that can
be functionally determined from it.
• Closure of a set F of FDs is the set F+ of all FDs
that can be inferred from F
Closure of an attribute

Given relational schema R( P Q R S T U V) having following attribute P


Q R S T U and V, also there is a set of functional dependency denoted
by FD = { P->Q, QR->ST, PTV->V }.
• Determine Closure of (QR)+ and (PR)+
• a) QR+ = QR (as the closure of an attribute or set of attributes contain
same).
• Now as per algorithm look into a set of FD that complete the left side
of any FD contains either Q, R, or QR since in FD QR→ST has
complete QR.
• Hence QR+ = QRST
• Again, trace the remaining two FD that any left part of FD contains
any Q, R, S, T.
• Since no complete left side of the remaining two FD{P->Q, PTV->V}
contain Q, R, S, T.
• Therefore QR+ = QRST
Attribute closure
• Consider a relation R ( A , B , C , D , E , F , G )
with the functional dependencies-
• A → BC
• BC → DE
• D→F
• CF → G
• find the closure of attributes and attribute
sets?
Attribute closure
• Closure of attribute A-
• A+ = { A }
• = { A , B , C } ( Using A → BC )
• = { A , B , C , D , E } ( Using BC → DE )
• = { A , B , C , D , E , F } ( Using D → F )
• = { A , B , C , D , E , F , G } ( Using CF → G )
• Thus,
• A+ = { A , B , C , D , E , F , G }
Attribute closure
• Closure of attribute D-
• D+ = { D }
• = { D , F } ( Using D → F )
• We can not determine any other attribute using
attributes D and F contained in the result set.
• Thus,
• D+ = { D , F }

Attribute closure
• Closure of attribute set {B, C}-
• { B , C } += { B , C }
• = { B , C , D , E } ( Using BC → DE )
• = { B , C , D , E , F } ( Using D → F )
• = { B , C , D , E , F , G } ( Using CF → G )
• Thus,
• { B , C }+ = { B , C , D , E , F , G }
Finding the Keys Using Closure

• Super Key-
• If the closure result of an attribute set contains all the
attributes of the relation, then that attribute set is called
as a super key of that relation.
• Thus, we can say-
• “The closure of a super key is the entire relation
schema.”
• In the above example,
• The closure of attribute A is the entire relation schema.
• Thus, attribute A is a super key for that relation.
Candidate Key

• If there exists no subset of an attribute set


whose closure contains all the attributes of the
relation, then that attribute set is called as a
candidate key of that relation
• In the above example,
• No subset of attribute A contains all the
attributes of the relation.
• Thus, attribute A is also a candidate key for that
relation.
Closure of an Attribute

b) PR + = PR (as the closure of an attribute or set of attributes


contain same)
• Now look into a set of FD, and check that complete left side of
any FD contains either P, R, or PR. Since in FD P→Q, P is a
subset of PR, Hence PR+ = PRQ
• Again, trace the remaining two FD that any left part of FD
contains any P, R, Q, Since, in FD QR → ST has its complete left
part QR in PQR
• Hence PR+ = PRQST
• Again trace the remaining one FD { PTV->V } that its complete
left belongs to PRQST. Since complete PTV is not in PRQST,
hence we ignore it.
• Therefore PR+ = PRQST
Attribute closure
• Consider the relation scheme R = {E, F, G, H, I,
J, K, L, M, N} and the set of functional
dependencies {{E, F} -> {G}, {F} -> {I, J}, {E, H}
-> {K, L}, K -> {M}, L -> {N} on R. What is the
key for R?
• A. {E, F}
B. {E, F, H}
C. {E, F, H, K, L}
D. {E}
Attribute closure
• Finding attribute closure of all given options,
we get:
{E,F}+ = {EFGIJ}
{E,F,H}+ = {EFHGIJKLMN}
{E,F,H,K,L}+ = {{EFHGIJKLMN}
{E}+ = {E}
{EFH}+ and {EFHKL}+ results in set of all
attributes, but EFH is minimal. So it will be
candidate key.

You might also like