0% found this document useful (0 votes)
392 views

F U-4 PDF

This document discusses schema refinement by introducing the concept of functional dependencies. It explains that conceptual database designs may contain redundancy that needs to be addressed. Functional dependencies can lead to redundancy if the same attribute values are stored multiple times. The document then provides an example relation to illustrate redundancy issues. It also discusses how decomposing relations and null values can help address problems related to redundancy. Finally, it covers characteristics and inference rules of functional dependencies.

Uploaded by

Riyaz Shaik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
392 views

F U-4 PDF

This document discusses schema refinement by introducing the concept of functional dependencies. It explains that conceptual database designs may contain redundancy that needs to be addressed. Functional dependencies can lead to redundancy if the same attribute values are stored multiple times. The document then provides an example relation to illustrate redundancy issues. It also discusses how decomposing relations and null values can help address problems related to redundancy. Finally, it covers characteristics and inference rules of functional dependencies.

Uploaded by

Riyaz Shaik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

UNIT - IV

INTRODUCTION TO SCHEMA REFINEMENT


Conceptual database design gives us a set of relation schemas and integrity constraints (ICs) that
can be regarded as a good starting point for the final database design. This initial design must be
refined by taking the ICs into account and also by considering performance criteria and typical
workloads.

A major aim of relational database design is to minimize data redundancy. The problems
associated with data redundancy are illustrated as follows:

Problems caused by redundancy

Storing the same information in more than one place within a database is called redundancy and
can lead to several problems:

 Redundant Storage: Some information is stored repeatedly.


 Update Anomalies: If one copy of such repeated data is updated, an inconsistency is
created unless all copies are similarly updated.
 Insertion Anomalies: It may not be possible to store certain information unless some
other, unrelated, information is stored as well.
 Deletion Anomalies: It may not be possible to delete certain information without losing
some other, unrelated, information as well.

Ex: Consider a relation, Hourly_Emps(ssn, name, lot, rating, hourly_wages, hours_worked)

The key for Hourly_Emps is ssn. In addition, suppose that the hourly_wages attribute is
determined by the rating attribute. That is, for a given rating value, there is only one permissible
hourly_wages value. This IC is an example of a functional dependency. It leads to possible
redundancy in the relation Hourly_Emps, as shown below:

ssn name lot rating hourly_wages hours_worked


567 Adithya 48 8 10 40
576 Devesh 22 8 10 30
574 Ayush Soni 35 5 7 30
597 Rajasekhar 35 5 7 32
5c1 Sunil 35 8 10 40

If the same value appears in the rating column of two tuples, the IC tells us that the same value
must appear in the hourly_wages column as well. This redundancy has the following problems:

 Redundant Storage: The rating value 8 corresponds to the hourly_wage 10, and this
association is repeated three times.
 Update Anomalies: The hourly_wage in the first tuple could be updated without making a
similar change in the second tuple.
 Insertion Anomalies: We cannot insert a tuple for an employee unless we know the
hourly_wage for the employee’s rating value.
 Deletion Anomalies: If we delete all tuples with a given rating value (e.g., we delete the
tuples for Ayush Soni and Rajasekhar) we lose the association between that rating value
and its hourly_wage value.

Null Values:

Null values cannot provide a complete solution, but they can provide some help.

Consider the example Hourly_Emps relation. Here null values cannot help to eliminate
redundant storage, update or deletion anomalies. It appears that they can address insertion
anomalies. For instance, we can insert an employee tuple with null values in the hourly wage
field. However, null values cannot address all insertion anomalies. Thus, null values do not
provide a general solution to the problems of redundancy, even though they can help in some
cases.

Decompositions:

A decomposition of relation schema R consists of replacing the relation schema by two or


more relation schemas that each contain a subset of the attributes of R and together include all
attributes in R.
Ex: We can decompose Hourly_Emps into two relations:

Hourly_Emps2(ssn, name, lot, rating, hours_worked)

Wages(rating, hourly_wages)

Problems Related to Decomposition:

Two important questions must be asked during decomposition process:

1. Do we need to decompose a relation?

2. What problems does a given decomposition cause?

To answer a first question, several normal forms have been proposed for relations. If a relation
schema is in one of these normal forms, we know that certain kinds of problems cannot arise.

With respect second question, two properties of decomposition are to be considered:

The lossless-join property enables us to recover any instance of the relation of the decomposed
relation from corresponding instances of the smaller relations.

The dependency-preservation property enables us to enforce any constraint on the original


relation by simply enforcing some constraints on each of the smaller relations. That is, we need
not perform joins of the smaller relations to check whether a constraint on the original relation is
violated.

Functional Dependencies:

A functional dependency (FD) is a kind of IC that generalizes the concept of a key.

Let R be a relation schema and let X and Y be nonempty sets of attributes in R. We say that an
instance r of R satisfies the FD X→Y1 if the following holds for every pair of tuples t1 and t2 in
r:

If t1.X = t2.X, then t1.Y = t2.Y.

1
X→ Y is read as X functionally determines Y, or simply as X determines Y.
An FD X→Y says that if two tuples agree on the values in attributes X, they must also agree on
the values in attributes Y.

Ex: The FD AB→C is satisfied by the following instance:

A B C D

a1 b1 c1 d1

a1 b1 c1 d2

a1 b2 c2 d1

a2 b1 c3 d1

Here, if we add a tuple <a1, b1, c2, d1> to the instance shown in figure, the resulting instance
would violate the FD.

Reasoning about FDs:

Given a set of FDs over a relation schema R, typically several additional FDs hold over R
whenever all of the given FDs hold.

Ex: Consider Workers(ssn, name, lot, did, since)

With given FDs ssn → did and did→ lot. Then, in any legal instance of Workers, if two tuples
have the same ssn value, they must have the same did value, and because they have the same did
value, they must also have the same lot value. Therefore, the FD ssn→ lot also holds on
Workers.

Closure of a Set of FDs

The set of all FDs implied by a given set F of FDs is called the closure of F, denoted as F+. The
closure F+ can be calculated by using the following Armstrong’s Axioms rules. Let X, Y, and
Z be the sets of attributes over a relation schema R:

 Reflexivity: If X is a super set of Y, then X→Y.


 Augmentation: If X→Y, then XZ→YZ for any Z.

 Transitivity: If X→Y and Y→Z, then X→Z.

 Union: If X→Y and X→Z, then X→YZ.

 Decomposition: If X→YZ, then X→Y and X→Z.

Ex1:

Consider a relation schema ABC with FDs A→B and B→C.

From transitivity, we get A→C.

From augmentation, we get AC→BC, AB→AC, AB→CB.

Ex2:

Contracts (  rojected  , supplierid,   rojected, deptid, partid, qty, value)

We denote the schema for Contracts as CSJDPQV.

The following are the given FDs:

i) C→ CSJDPQV.

ii) JP→C.

iii) SD→P.

Several additional FDs hold in the closure of the set of given FDs:

From JP→C and C→CSJDPQV, and transitivity, we infer JP→CSJDPQV.

From SD→P and augmentation, we infer SDJ→ JP.

From SDJ→JP, JP→CSJDPQV, and transitivity, we infer SDJ→CSJDPQV.

Note:
 In a trivial FD, the right side contains only attributes that also appear on the left side.
Using reflexivity, we can generate all trivial dependencies, which are of the form:

X→Y, where Y is a subset of X, X is a subset of ABC, and Y is a subset of ABC.

From augmentation, we get the nontrivial dependencies.

Attribute Closure

If we want to check whether a given dependency, say, X→Y, is in the closure F+, we can do so
efficiently without computing F+. We first compute the attribute closure X+ with respect to F,
which is the set of attributes A such that X→A can be inferred using Armstrong Axioms. The
algorithm for computing the attribute closure of a set X of attributes is shown below:

closure = X;

repeat until there is no change: {

if there is an FD U→V in F such that U Є closure,

then set closure = closure U V

CHARACTERISTICS OF FUNCTIONAL DEPENDENCY:

(1) It deals with the 1-1 Relationship between attributes and rearly it will also talk about 1-M
(2) F.D must be defined on the scheme but not instances.
(3) F.D must be a Non-trivial.
(4) In trivial F.D RHS is a complete subset of LHS Eg: ABC  BC
(5) In non-trivial F.D at least one of the RHS attributes is not a subset of LHS Eg: ABC
 BD
(6) In a complete non-trivial F.D none of the RHS attributes are the subset of LHS Eg: ABC
 DE
Once the F.D’s are identified from semantics then additional F.D’s can be derived from
the existing set.
Eg: F1 from semantics and F2 from F1 then total F.D’s= F1+ F2.The input for the
normalization process should be F1+ F2. F2 can be identified in the different ways.
(1) By using interference rules.
(2) By closure set of attributes.
INFERENCE RULES:

(1) Reflexive: If ‘B’ is a subset of ‘A’ then always ‘A’ can determine ‘B’ A  B
(2) Augmentation: If A  B then AC  BC
(3) Transitive: If A  B and B  C then A  C
(4) Union: It is applied for the LHS attributes i.e., If A  B , A  C then A  BC
(5) Decomposition: If A  BC then we can write it as A  B and A  C
(6) Composition: If A  B and C  D then, AC  BD
(7) Self determination: A  A , B  B
1Q Find the additional F.D’s derived from F1 where a set of F.D’s from semantics.

(1) A  B 
A  C
(2) B  C
(3) C  D AC
(4) D  E  F2= D  EH
D  EH
(5) D  H  FH
(6) E  F
( 7) F  G 
F  H
(8) G  H 

By transitive rule, A  C from (1) & (2)

By union rule, D  EH from (4) & (5)

By transitive rule, F  H from (7) & (8)

CLOSURE SET OF ATTRIBUTES:

Algorithm used to identify the closure set of attributes:

(1) Let ‘X’ be a set of attributes that will become the closure.
(2) Repeatedly search for a F.D where the LHS of F.D is a part of ‘X’ then add RHS of the
F.D to ‘X’ is already not available.
(3) Repeat step (2) as many times as necessary until no more attributes can be added to ‘X’.
(4) The set ‘X’ after no more attributes can be added to ‘X’ will become a closure set.
Applications of closure set of attributes:

(1) To identify the additional F.D’s.


(2) To identify the keys.
(3) To identify the equivalences of the F.D’s
(4) To identify irreducible set of F.D’s or canonical forms of F.D’s or standard form of
F.D’s.
2Q Consider a relation ABCDEFG and FDs are

AB
BC  DE
AEG  G
Find AC

Ans: X= AC

= ACB

= ABCDE

AC = ABCDE

3Q For the relation R(ABCDE) and FDs are

A  BC
CD  E
BD
EA
Find B  , AB & CD 

Ans: X =B
=BD
B+ =BD

Find AB+
X = AB
= ABC
= ABCD
=ABCDE
AB = ABCDE

Find CD+
X= CD
= CDE
= ACDE
=ABCDE
CD  = ABCDE

4Q For a relation R (ABCDEF) and FDs are


AB  C
BC  AD
DE
CF  B
Find AB 

Ans: Find AB+

X= AB
= ABC
= ABCD
=ABCDE
AB = ABCDE

(1) Identifying the additional F.D’s


To check any F.D’s like A  B can be determined from F1 or not.

Complete A+ from F1 is A+ includes B also then, A  B can be derived as a F.D in F2.

Q5 Check D  A can be derived from F1

F1:
AB  C
BC  AD
DE
CF  B

i. DA
X =D
=DE
D+=DE
D A i.e. Cannot be determine A
ii. AB  D
X=AB
=ABC
=ABCD
AB+=ABCD
iii.AB  F
X=AB
=ABC
=ABCD
=ABCDE
+
AB =ABCDE
AB cannot be determine F

Q6 Find BCD  H can be derived from F1


A  BC
CD  E
EC
F1:
D  AEH
AEH  BD
DH  BC

Sol:
i. X=BCD
=BCDE
=BCDEH
=BCDEAH
BCD+=BCDEAH
BCD  H

(ii) ABC  H
X=ABC
ABC H
(2) Identification of key by using closure set as attributes

(i) A primary key


(ii) Composite primary ey
(iii) Candidate keys
(iv) Foreign keys
(v) Surrogate key
(vi) Super key
Max.no.of Foreign keys=1024.

A key attribute: An attribute that is capable of identifying all other attributes in a given table.

i) Primary key:
It is an unique value attribute in a table to enforce entity integrity and ti
identify rows in the table uniquely.
ii) Composite Primary Key:
Sometimes single attribute is not sufficient to identify uniquely the rows in the table so,
we combine 2 or more attributes to identify the rows uniquely.

iii)Candidate keys:
Sometimes 2 or more independent attribute or attributes can be used to identify the rows
uniquely Eg :( vech no,veng no,purchase date)
Either vehicle no or vehicle engine no can be used as a key attribute then they are called
as candidate keys one of the candidate key can be elected as primary key.
iv) Surrogate key:
Sometimes even if you combine all the attributes in the table they may not have unique
values.
To identify the rows uniquely we will use a system generated key called as surrogate
key.
v)Foreign key:
It is used to enforce referential integrity and an attribute in a table can be called as a
foreign key attribute that refer primary key in same or different table.

Q1 If R(ABCDEH) and FDs are

A  BC
CD  E
EC
D  AEH
AEH  BD
DH  BC

Find keys?

Super key=ABCDEH
Now find A+=ABC
E+=EC
D+=DAEH
=ABCDEH
D is key
 If the closure of any of the LHS attributes are combinations of the LHS attributes
includes all the attributes in the table then that will become a key in the table.
 A table can have 2 or more keys.
Q2 Consider a relation with five attributes ABCDE and FDs are

AB
BC  E
ED  A

List all keys for R.

Sol: Super key: ABCDE

A+=AB X
+
BC =BCE X
+
ED =ABDE X
AB+=AB X
AC+=ABCE X
BD+=BD X
ABC+=ABCE
BCD+=BCDEA
ACD+=ABCDE
CDE+=ADEBC
Q3 R(ABCDE) & FDs are

AB  C
CD  E
DE  B

List all keys for R.


Sol: Super key=ABCDE

AB+=ABC
CD+=CDE
DE+=DEB
ABC+=ABC
ABD+=ABDCE
ABE+=ABEC
ACD+=ACDEB
ABD & ACD are keys

Q4 R(ABCDEGGHIJ) &FDs are

AB  C
A  DE
BF
F  GH
D  RJ

List all keys for R

Note:

Sometimes all the attributes in the table may not appear in F.D’s

Eg: D I may be available instead of D IJ

AB+=ABCDEFGHIJ

The key for the relation R=ABJ the missing attributes from the F.D’s must be attached to
the closure.

Q5 R(ABCDEFGHIJ) & FDs are


AB  C
BD  EF
AD  GH
A I
HJ

List all keys for R

Sol:

Super key=ABDH

AB+=ABC X
BD+=BDEF X
AD+=ADGHIJ X
ABD+=ABDCEFGHIJ
ABH+=ABHCIJ X
ABD+=ABCDEFGHIJ
(3) To identify equivalence of F.D

Different database designers may define different F.D’s sets from the same
requirements.To evaluate whether they are equivalent if we are able to derive all F.D’s in
G from F and vice-versa.

Q1 Consider the following two sets of FDs

AC
AC  D
F=
E  AD
EH

A  CD
G=
E  AH

Find the equivalence of two sets of FDs.

Step 1: Take set F and enclose all FD’s in G that can be derived from F.
A  CD

A+ from F

X=A
=AC
=ACD
A  CD can be derived from F
E  AH

E+ from F

X=E
=EAD
=EADH
E  AH can be derived from F

Step 2: Take set G and enclose all F.D’s in F that can be derived from G.
AC

A+ from G

X=A
=ACD
A  C can be derived from G
X=AC
=ACD
E  AD
X=E
=EAH
=EAHCD
E  AH & E  ADfromG
F G so, G is preferable as it contains less FDs.

Q2 Consider two sets of FDs on the attributes ABCDE


B  CD
F= AD  E
BA

B  CDE
G= B  ABC
AD  E

Find whether they are equivalent or not

Sol:

Step 1:

B  CDE

B+ from F

X=B
=BCDA
=ABCDE
All FD’s are derivable from F.
Step 2:

B  CD

B+ from G

X=B
=BCDE
=ABCDE
All FD’s are derivable from G.
F G
F is preferable
No of dependencies are less.
(4) To identify the irreducible form of FD’s /canonical Form
Once F1 is identified from the semantics and F2 is derived from F1 we get total F.D’s i.e F
but before making a move to the normalization process with F,F must be evaluated for
redundant attribute on the LHS and RHS of F.D’s and it is a four step process.

Step 1: Have single attributes on the RHS for every FD.

Step 2: Evaluate all F.D’s in step 1 for their necessity. If they are not necessary, remove them
from the list.

Step 3: Evaluate the necessity of the RHS attributes in FD’s obtained from step 2.If they are not
necessary remove from FD.

Step 4: Apply the union rule for common to LHS attribute in the FD’s obtained from step 3.Then
we will get irreducible set.

Q1 Find the irreducible set from the following FDs

F=

AB
CB
D  ABC
AC  D

Sol:

Step 1:

(1) A  B
(2) C  B
(3) D  A
(4) D  B
(5) D  C
(6) AC  D

Step 2:
Remove 1 & compute A+ from2, 3,4,5,6
A+=A
We need 1
Remove 2 and compute 1, 3, 4, 5&6
C+=C
We need 2.
Remove 3 and compute D+ from 1, 2, 4, 5&6
D+=DBC
We need 3.
Remove 3 and compute D+ from 1, 2, 4, 5&6
D+=DBC
We need 3.
Remove 4 and compute D+ from 1, 2, 4, 5&6
D+=ADCB
D B can be removed.
Remove 5 and compute D+ from 1, 2, 3,4&6
D+=ABD
We need 5.
Remove 6 and compute D+ from 1, 2,3, 4, 5
AC+=ACB
We need 6.
Step 3:
AB
CB
DA
DC
AC  D
Remove A
AB AB
CB CB
DA DA
DC DC
CD AC  D
C+=CDAB C+=CB

C+ C+

Remove C

AB AB
CB CB
DA DA
DC DC
AD AC  D

A+=ADCB A+=AB

A+ A+

Step 4:

AB
CB
DA
DC
AC  D

AB
CB
Therefore, it is an irreducible F.D.
D  AC
AC  D

Q2 Consider Universal relation with attributes ABC and FDs

AB C
C B
A B
Find the Irreducible set
Sol:
Step 2:
Remove (1) & compute AB+ from 2&3
AB+=AB
We need 1
Remove (2) & compute AB+ from 1&3
C+=c
We need 2
Remove (3) & compute A+ from 1&2
A+=A
We need 3
Step: 3
AB C
C B
A B
Remove A
AB C B C

C B C B

A B A B

B+=B B+=BC

B+ B+

Remove B

AB C A C

C B C B

A B A B

A+=ABC A+=ACB

A+=A+
B can be removed

Step 4:

A C

C B

A B

A BC & C B it is an irreducible set.

Q3 FDs are

F= ABD AC
C BE
AD BF
B E
Find the minimal set
Step 1:
ABD A
ABD C
C B
C E
AD B
AD F
B E
Remove (1) & compute ABD+ from (2-7)
ABD+ =ABDCEF
(1) can be removed
Remove (2) & compute ABD+ from (1,3-7)
ABD+ =ABDEF
We need (2)
Remove (3) & compute C+ from (1,2,4-7)
C+ =CE
We need (3)
Remove (4) & compute C+ C+ =BCE
(4) Can be removed
Remove (5) & compute AD+
AD+ =ADF
We need (5)
Remove (6) & compute AD+
AD+ =ADBCE
We need (6)
Remove (7) & compute B+
B+ =B
We need (7)
Step 3:
ABD C

C B

AD B

AD F

B E

Remove A

ABD C BD C
C B C B
AD B AD B
AD F AD F
B E B E
BD+=BDE BD+=BDCE
BD+ BD+
Remove B

ABD C AD C

C B C B

AD B AD B

AD F AD F

B E B E

AD+=ABFECD AD+=ADCFBE

AD+= AD+

B can be removed.

Types of functional Dependencies:

(1) Partial F.D


(2) Transitive F.D
(3) Full F.D.

1. Partial F.D: A dependency in which non-key attributes are partially depending on key
attributes.
R=ABCD
F=AB C
=B D
Key: AB but B is depending only D therefore B D is considered as partial dependency

Under the following conditions a table cannot have partial F.D

(1) If primary key consists a single attribute


(2) If table consists only two attributes
(3) If all the attributes in the table are part of the primary key.
2. Transitive F.D: A dependency in which there is a relationship among the non-key
attributes.

Eg:R=ABCD
F: AB C
AB D
C D
Key=AB
C d Is a transitive dependency and it causes insertion, deletion & updation problems in
the table.

Under the following Circumstances, a table cannot have transitive F.D

(1) If table consists only two attributes


(2) If all the attributes in the table are part of the primary key.

3. Full F.D:A dependency X Y is considered as a full F.D if the removal of any attribute
from X makes X Y as invalid F.D

Eg: AB CD
B CDX
AB CD is a full F.D

NORMALIZATION

 It is the process of reducing the redundancy based on primary keys and F.D
OR
 It is a tool to validate or evaluate the logical database design with the help of rules which
are called as Normal Forms. They are
1 NF
2 NF
3 NF
BCNF Problem intensity reduces and no. of tables needed will be increased.
4 NF
5 NF
DKNF

Points to be Remember

 1 NF is a mandatory NF and remaining are the optional


 If you construct E-R diagrams in to the tables, then 4 NF and 5 NF need not be applied on
the table.
 Practically applied normalization is upto 3NF and very rarely we will go beyond that.
 2 NF dealing with the partial dependencies and 3NF is dealing with transitive
dependencies.
First Normal Form (1NF):

 The cells of the table must have single atomic value


 Neither repeating groups nor arrays are allowed as values
 All entries in any column (attribute)must be of the same kind
 Each column must have a unique name, but the order of the columns in the table is
insignificant.
 No two rows in a table may be identical and the order of rows is insignificant

Second Normal Form (2 NF):


 A table is said to be in 2NF if it is already in the 1NF and free from Partial dependency.
 Anomalies can occur when attributes are dependent on only part of a multi-attribute key.
 A relation is in second normal form when all non-key attributes are dependent on the
whole key.
 Any relation having a key with a single attribute is in second normal form
Q1 Consider universal relation R=ABCDEFGHIJ and the set of FDs are
F=AB C
A DE
B F
F GH
D IJ

What is the key for R? Decompose R into 2NF

Sol: key=AB
AB=CDEFGHIJ

Step 1: A B C D E F G H I J

Or

A+=ADEIJ

B+=BFGH

If there is a partial dependency, remove partially dependent attributes from the original table and
place it in a separate table along with the copy of its determinant.

(a) Key =AB


+
(b) A =ADEIJ
B+=BFGH
DEIJ-depending only on A
FGH-depending only on B

R 1 = ADEIJ

(c) R 2 = BFGH R=ABC
R 3 = ABC 

Required 2 NF
Q2 Consider the relation R=ABCDEF and set of FDs are
F=A FC
C D
B E
Find the key and normalize into 2NF
Sol:

(a) Key=AB
(b) A+=ACDF
B+=BE
R 1 = ACDF

(c) R 2 = BE  R=AB
R 3 = AB 

Requried 2 NF

Q3 Consider the relation R=ABCDE. Find the key and normalize upto 2NF

F=B E
C D
A B
Sol:(A) KEY=AC
(B) A+=ABE
C+=CD
(C) R=A

R 1 = ABE

R 2 = CD  Required 2 NF
R 3 = AC 

Q4 Consider R= ABCDEFGHIJ and FDs are

F=AB C
BD EF
AD GH
A I
H J
Find the key and normalize upto 2NF

Sol: (A) Key=ABD


(B) AB+=ABCI
BD+=BDEF
AD+=ADGHIJ
(C)R1=ABCI A+=AI R11=AI
B+=B R111=ABC
R2=BDEF B+=B R2=BDEF
D+=D
R3=ADGHIJ A+=AI R31=AI
D+=D R311=ADGHJ

R4=ABD

Third Normal Form (3 NF): A table is said to be in the 3 NF is it is already in the 2 NF and
must be free from transitive dependencies.
 Anomalies can occur when a relation contains one or more transitive
 A transitive dependency exists when ABC and NOT BA
 A relation is in 3NF when it is in 2NF and has no transitive dependency
 A relation is in 3NF when “All non-key attributes are dependent on the key, the whole
key and nothing but the key”.
If there is a transitive dependency, remove transitively dependent attribute from 2 NF table and
place it in a separate table along with the copy of its determinant.
Update anomalies occur in an 3NF relation R if

 R has multiple candidate keys,


 Those candidate keys are composite, and
 The candidate keys are overlapped

Q1 Consider universal relation R=ABCDEFGHIJ and the set of FDs are


F=AB C
A DE
B F
F GH
D IJ
Decompose R into 3NF

Transitive D IJ ADEIJ
DIJ
ADE
BFGH FGH
BP
ABC ABC

R 1 = DIJ R 1 = DIJ 
R 2 = ADE R 2 = ADE 

R 3 = FGH R 3 = FGH Iti sin 3NF
R 4  BF R 4  BF 

R 5  ABC R 5  ABC 

Q2 Consider the relation R=ABCDEF and set of FDs are


F=A FC
C D
B E
Normalize R into 3NF
ACDF CD
ACF
BE BE
AB

R 1 = CD 
R 2 = ACF
in3NF
R 3 = BE 
R 4  AB 

Q3 Consider the relation R=ABCDE. Find the key and normalize upto 3NF

F=B E
C D
A B

R1=ABE BE
AB
R2=CD CD
R3=AC AC
R 1 = BE 
R 2 = AB
Iti sin 3NF
R 3 = CD 
R 4  AC

Q4 Consider R= ABCDEFGHIJ and FDs are

F=AB C
BD EF
AD GH
A I
H J
Normalize upto 3NF
R1=AI
R2=ABC
R3=BDEF
R4=ADGHJ HJ
ADG
R5=ABD
R 1 = AI 
R 2 = ABC 
R 3 = BDEF
Iti sin 3NF
R 4  HJ 
R 5  ADG 

R 6  ABD 

Q5(a) Give a set of FDs for the relation schema R(ABCD) with primary key AB under
which R is 1NF but not in 2NF
(b) Find FDs such that R is in 2NF but not in 3NF
R=ABCD
Key=AB
Sol: (a) with these FD’s table cannot be in 2NF
B C A C
B D A D
(b ) with these FD’s the table may be in 2NF but not in 3NF
C D D C
Note 1: In general if x  ; if  AorB and x  A or x  b (key=AB) then it will violate 2 NF
Note 2: In general, if I have x and x is not a proper set of AB then it

will violet 3NF but not 2 NF.


Note 3: If there is a F.D x y,It is allowed in 3NF(also in 2NF) if x is a super key or y is a
part of key.

Lossless join property and dependency preserving:


In a good database design, it is not only sufficient to check whether tables are satisfying 2NF,
3NF & BCNF but also check whether decomposed tables are satisfying the following two
properties.
(1) Lossless join Property(mandatory)
(2) Dependency Preserving Property(optional)

Lossless join Property:

A decomposition of R into R1,R2,R3,…..Rn is said to a loss less decomposition if the natural


joint of all the projections of R must give the original relation i.e
R=  R1 ( R ) R 2 ( R )...... Rn (R ) suppose, if R   R1 ( R ) R 2 ( R )....... Rn ( R ) then
decomposition is said to be lossy decomposition.

Q:
R:
A B C

a1 b1 c1

a2 b2 c2

a3 b1 c3

R1
A B

a1 b1

a2 b2

a3 b1
R2
`
B C

b1 c1

b2 c2

b1 c3

A B C

a1 b1 c1

a1 b1 c3

a2 b2 c2

a3 b1 c1

a3 b1 c3

5 rows

Lossy decomposition

The above method is a time consuming and error prone there fore,to check the lossless joint
property we use the following short cut method.

i.e, If the common column b/w the relation consists unique value(only primary key willwill have
unique values) then the decomposition will become lossless otherwise it is a lossy
decomposition.

R1  R 2  R1
or
R1  R 2  R 2

Dependency preserving property:


Decomposition of R into R1,R2,R3,…..Rn is said to be a dependency preserving decomposition
if,(F1,F2.........Fn)+=F+ where F1 is the set of F.D’s in R1, F2 is the set of F.D’s in R2 and so on and
F is the set of F.D’s in R.

Eg:consider a table consists R=ABCD attributes then F= A B,A C,C D is decomposed into
R1(ABC),R2(CD).find whether this decomposition is satisfying lossless join and dependency
preserving property.

Sol: R1 (ABC)

R2 (CD)

It is a lossless decomposition and dependency preserving relation.

A B (R1)

A C (R1)

C D (R2)

If the table is decomposed like R1 (ABD), R2 (BC) it is a lossy decomposition

It is also not a dependency preserving relation.

Boyce-Codd Normal Form (BCNF):

A table is said to be in BCNF, if it is already in the 3NF and if every non trivial F.D has a
candidate key as its determinant.
OR
A table is said to be in BCNF if all determinants are keys in the 3NF table or they must be super
keys
 Anomalies can occur in relation in 3NF if there are determinants in the relation that are
not candidate keys.
 A relation is in BCNF if every determinant is a candidate key,
 To test whether a relation is in BCNF, we identify all the determinants and make sure that
they are candidate key.
The following conditions are not properly handled by 3NF

(1) If table consists two or more composite primary keys


(2) If the candidate keys overlap.
3 NF 2 NF

1.It concentrates on the primary key it concentrates on all candidate keys

2.Redundancy is high compared to BCNF Redundancy is low compared to 3 NF

3.It may preserve all dependencies It may not preserve all F.D’s

4.If there is a dependency x y is allowed in If there is a dependency x y.It is


3NF if x is a super key or Y is the part of allowed in BCNF if X is super key.
key

Q:

(1) R=ABCD

A D
C A
B C
Sol:
(a) Key=b
(b) At present A,B,C,D one in 2NF
(c) BACD
(2)B C

D A

Sol:

(a) Key=BD
(b) 1 NF
(c) 2 NF=BC,DA,BD
3 NF= BC, DA, BD
(d) BCNF= BC,DA,BD
BCD  DA
Lossless decomposition.

(3) ABC D
D A
Sol:
(a) Key=ABC,BCD
(b) let key=ABC
3 NF
(c) BCNF=ABC
Super key ABC D

D A

BCD, DA (but ABC will not be a key)

Q. R  ABCD

AB

BC  D

AC

Sol:

a). key=A

b). 2NF

c). 3NF=BCD,ABC

d). BCNF:
A A
 BCNF
BC BC
Q). AB  C
AB  D

CA

DB

Sol:

a). key=AB,BC,CD,AD

B). CHOOSE AB as a key then <ABCD>

 3NF

c).AB AB

C CA X

D DB X

 CA, DB,CD(but AB absent)

 AB, CD, CA, DB(add AB)

OR

Sol is 3NF

Q). AB  CEFG

AD

FG

FB  H

HBC  ADEFG
FBC  ADE

Sol:

a).key = AB,HBC,FBC

B). Choose key AB <ABCDEFGH> AD

ABCEFGH

 2NF=AD,ABCEFGH

c). 3NF=AD,ABCEFGH FG

ABCEFH

 3NF=AD, FG, ABCE, FH ABCEF

FBH

D).BCNF

AB A
A F
F AB
FB
HBC
FBC
 BCNF=FG,AD,FBH,ABCEF
Q). R=ABCD

R1=BC &AD(R2)

(1).

BC
DA
Sol:
(1) a).key=BD

b).R1  R2=0

 lossy decomposition

 bad decomposition

The decomposition is bad decomposition

 it is not satisfying lossless join property but it is satisfying dependency preserving


property.

(2).AB  C

CA R1=ACD

CD R2=BC

A). key=AB,BC

b). R1  R2=C

it is a lossless decomposition as common attribute ’c’ can become a key for the first table
ACD.

(3). A  BC R1=ABC

C  AD R2=AD

A).KEY=A

b). R1  R2=A

 It is a lossless decomposition as A is a common attribute and can become a key.

(4). A  B
B C R1=AB
C D R2=ACD
Sol:
a).key= A

b). R1  R2=A

 Lossless

FD not preserved  B  C

Q).

A B R1=AB
B C R2=AD
C D R3=CD
Sol:
 It is a lossy decomposition

Q).

R=ABCDE

AB DE
A C
D E
Sol:
(a) Key=AB
(b) 2 NF=ABDE,AC
(c) 3 NF=ABD,DE,AC
(d) BCNF=AB AB
D A
D A

Q)

AB CDE
C A
D E
Sol:

(a) Key=AB,BC
(b) choose AB as a key and no partial dependencies
3 NF=ABCD, DE
BCNF= ABCD,CA,DE

Limitations of Normalization:

(1) It cannot detect redundancy among the tables.


(2) It may slow down the query retrival process
(3) The decomposed tables may not have real world meaning i.e, we cannot understand the
significant of the tables straight away.
Solutions:

 To avoid all the problems selectively we go for De-Normalization i.e,decomposed


relations are combined together to improve the performance of the speed up the
data retrieval.
Advantages:

(1) Database consistency can be achieved


(2) It increases the speed (By eliminating the duplicates)
(3) It reduces the disk size
(4) It maintains the integrity.
Multi-Valued Dependencies (MVD)

The possible existence of multi-valued dependencies in a relation is due to first normal form
(1NF), which disallows an attribute in a tuple from having a set of values.

For example, if we have two multi-valued attributes in a relation, we have to repeat each value of
one of the attributes with every value of the other attribute, to ensure that tuples of a relation are
consistent. This type of constraint is referred to as a multi-valued dependency and results in data
redundancy.
Consider the Employee relation which is not in 1NF shown in figure:

EmpNum EmpPhone EmpDegrees

{ 040-222222,

111 040-333333} { BA, BSc}

The result of 1NF on above relation is shown below:

EmpNum EmpPhone EmpDegrees

111 040-222222 BA

111 040-333333 BSc

111 040-222222 BSc

111 040-333333 BA

This relation records the EmpPhone and EmpDegrees details of an employee 111. However, the
EmpDegrees of an employee are independent of EmpPhone. This constraint results in data
redundancy and is referred to as multi-valued dependency.

Multi-Valued Dependency (MVD):

Represents a dependency between attributes (for example, A, B, and C) in a relation, such that
for each value of A there is a set of values for B and a set of values for C. However, the set of
values for B and C are independent of each other.

We represent an MVD between attributes A, B, and C in a relation using the following notation:

A →→ B

A →→ C
For example, we specify the MVD in the above Employee relation as follows:

EmpNum →→ EmpPhone

EmpNum →→ EmpDegrees

Formally, an MVD can be defined as:

Let R be a relation schema, and X and Y be disjoint subsets of R, and Z = R─XY.

If a relation R satisfies X →→Y2, the following must be true for every legal instance of r of R:
if for any two tuples t1, t2 and t1(X) = t2(X), then there exist t3 in r such that
t3(X) = t1(X), t3(Y) = t1(Y), t3(Z) = t2(Z).

By symmetry, there exist t4 in r such that, t4(X) = t1(X), t4(Y) = t2(Y), t4(Z) = t1(Z).
X Y Z
x1 y1 z1 t1
x1 y2 z2 t2
x1 y1 z2 t3
x1 y2 z1 t4

The MVD X→→ Y says that the relationship between X and Y is independent of the
relationship between X and R─ Y.
Armstrong Axioms rules relate to MVDs:

 MVD Complementation: If X→→Y, then X→→R─XY.

 MVD Augmentation: If X→→Y and W be super set of Z, then WX→→YZ.

 MVD Transitivity: If X→→Y and Y→→Z, then X→→(Z─Y).

Trivial MVD:

2
X→→Y can be read as X multi-determines Y.
An MVD A→→B in relation R is defined as being trivial if,

a) B is a subset of A or

b) A U B = R.

An MVD is said to be non-trivial if neither a) nor b) is satisfied.

Fourth Normal Form (4NF):

A relation that is in Boyce-Codd normal form and contains no non-trivial MVDs.

Formally, Let R be a relation schema, X and Y be nonempty subsets of the attributes of R. R is


said to be in 4NF, if, for every MVD X→→Y that holds over R, one of the following is true:

 Y is a subset of X or XY = R, or

 X is a super key.

Example:

Consider the above Employee relation.

Identifying non-trivial MVD:

The Employee is not in 4NF because of the presence of non-trivial MVD.

Transformation into 4NF:

We decompose the Employee relation into Emp1 and Emp2 relations as shown below:

Emp1 Emp2

EmpNum EmpPhone

111 040-222222
EmpNum EmpDegrees

111 BA

111 BSc
111 040-333333

Both new relations are in 4NF because the Emp1 relation contains the trivial MVD
EmpNum→→EmpPhone, and the Emp2 relation contains the trivial MVD
EmpNum→→EmpDegrees.

Properties of Decomposition

Lossless-Join Decomposition:

Let R be a relation schema and let F be a set of FDs over R. A decomposition of R into two
schemas with attribute sets X and Y is said to be a lossless-join decomposition with respect to
F if, for every instance r of R that satisfies the dependencies in F, ∏ x (r) ∏y (r) = r. In
other words, we can recover the original relation from the decomposed relations.

From the definition it is easy to see that r is always a subset of natural join of decomposed
relations. If we take projections of a relation and recombine them using natural join, we typically
obtain some tuples that were not in the original relation.

Example:

By replacing the instance r shown in figure with the instances ∏SP (r) and ∏PD (r), we lose some
information.

S P D S P P D

s1 p1 d1 s1 p1 p1 d1

s2 p2 d2 s2 p2 p2 d2

s3 p1 d3 s3 p1 p1 d3

Instance r ∏SP (r) ∏PD (r)


S P D

s1 p1 d1

s2 p2 d2 ∏SP (r) ∏PD (r)

s3 p1 d3

s1 p1 d3

s3 p1 d1

Fig: Instances illustrating Lossy Decompositions

Theorem: Let R be a relation and F be a set of FDs that hold over R. The decomposition of R
into relations with attribute sets R1 and R2 is lossless if and only if F+ contains either the FD R1
∩ R2 → R1 (or R1─R2) or the FD R1 ∩ R2 → R2 (or R2─R1).

Consider the Hourly_Emps relation. It has attributes SNLRWH, and the FD R→W causes a
violation of 3NF. We dealt this violation by decomposing the relation into SNLRH and RW.
Since R is common to both decomposed relations and R→W holds, this decomposition is
lossless-join.

Dependency-Preserving Decomposition:

Consider the Contracts relation with attributes CSJDPQV. The given FDs are C→CSJDPQV,
JP→C, and SD→P. Because SD is not a key, the dependency SD→P causes a violation of
BCNF.

We can decompose Contracts into relations with schemas CSJDQV and SDP to address this
violation. The decomposition is lossless-join. But, there is one problem. If we want to enforce an
integrity constraint JP→C, it requires an expensive join of the two relations. We say that this
decomposition is not dependency-preserving.

Let R be a relation schema that is decomposed into two schemas with attributes sets X and Y,
and let F be a set of FDs over R. The projection of F on X is the set of FDs in the closure F+ that
involve only attributes in X. We denote the projection of F on attributes X as F X . Note that a
dependency U→V in F+ is in FX only if all the attributes in U and V are in X.

The decomposition of relation schema R with FDs F into schemas with attribute sets X and Y is
dependency-preserving if (FX U FY)+ = F+.

Example:

Consider the relation R with attributes ABC is decomposed into relations with attributes AB and
BC. The set of FDs over R includes A→B, B→C, and C→A.

The closure of F contains all dependencies in F plus A→C, B→A, and C→B. Consequently F AB
contains

A→B and B→A, and FBC contains B→C and C→B. Therefore, FAB U FBC contains A→B,
B→C, B→A

and C→B. The closure of FAB and FBC now includes C→A (which follows from C→B and
B→A). Thus

the decomposition preserves the dependency C→A.

You might also like