0% found this document useful (0 votes)
9 views

15 05 Normalisasi

Normalisasi

Uploaded by

rommyaleka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

15 05 Normalisasi

Normalisasi

Uploaded by

rommyaleka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 48

Topic 5:

NORMALIZATION
for Relational Database

Ir. Endang Ripmiatin, MT

Program Studi Teknik Informatika


Universitas Al Azhar Indonesia
19 Nov 2015
Informal measures of quality for
relation schema design 2

• Semantics of the attributes


• Reducing the redundant values in tuples
• Reducing the null values in tuples
• Disallowing the possibility of generating
spurious tuples
1 Informal Design Guidelines for
Relational Databases (1) 3

• What is relational database design?


• The grouping of attributes to form "good" relation schemas

• We first discuss informal guidelines for good relational


design

• Then we discuss formal concepts of functional


dependencies and normal forms
• 1NF (First Normal Form)
• 2NF (Second Normal Form)
• 3NF (Third Normal Form)
• BCNF (Boyce-Codd Normal Form)
1.1 Semantics of the Relation
Attributes 4

• GUIDELINE 1: Informally, each tuple in a relation should represent


one entity or relationship instance. (Applies to individual relations
and their attributes).

• Attributes of different entities (EMPLOYEEs, DEPARTMENTs, PROJECTs)


should not be mixed in the same relation.

• Only foreign keys should be used to refer to other entities.

• Entity and relationship attributes should be kept apart as much as possible.

• Bottom Line: Design a schema that can be explained easily relation


by relation. The semantics of attributes should be easy to
interpret.
The COMPANY relational
database schema 5

DLOCATION

Multivalued attribute
GUIDELINES in designing tables 6

• GUIDELINE 1: Semantics of the Relation


Attributes  each tuple in a relation should
represent one entity or relationship instance.
• GUIDELINE 2: Design a schema that does not
suffer from the insertion, deletion and update
anomalies.
1.2 Redundant Information in
Tuples and Update Anomalies 7

• Mixing attributes of multiple entities may


cause problems
• Information is stored redundantly wasting
storage
• Problems with update anomalies
• Insertion anomalies
• Deletion anomalies
• Modification anomalies
1.2 Redundant Information in
Tuples and Update Anomalies 8

• GUIDELINE 2: Design a schema that does not


suffer from the insertion, deletion and update
anomalies. If there are any present, then note
them so that applications can be made to take
them into account
EXAMPLE OF AN UPDATE
ANOMALY (1) 9

• Consider the relation:


• EMP_PROJ ( EmpNumber, ProjNumber, EName,
PName, No_hours)

• Modification Anomaly: Changing the name of


project number P1 from “Billing” to
“Customer-Accounting” may cause this update
to be made for all 100 employees working on
project P1.
EXAMPLE OF AN UPDATE
ANOMALY (2) 10

• Insert Anomaly: Cannot insert a project unless


an employee is assigned to.
• Inversely - Cannot insert an employee unless an
he/she is assigned to a project.

• Delete Anomaly: When a project is deleted, it


will result in deleting all the employees who
work on that project. Alternately, if an
employee is the sole employee on a project,
deleting that employee would result in
deleting the corresponding project.
Examples 11

Table Name: BARANG_PEMASOK

kode_barang nama_barang harga_jual kode_pemasok nama_pemasok kota


T-001 TV ABC 14" 600.000 P22 PT Citra Jaya Bogor
T-002 TV ABC 21" 950.000 P22 PT Citra Jaya Bogor
T-003 TV XYZ 18" 450.000 P11 PT Amerta Bandung
T-004 TV Rhino 29" 1.750.000 P33 PT Kartika Yogyakarta
T-005 TV Kirana 14" 475.000 P44 PT Nindya Tangerang

• Explain:
– insert anomaly
– update anomaly
– delete anomaly
Two relation schemas suffering
from update anomalies 12
Example States for EMP_DEPT
and EMP_PROJ 13

IF53300535 Basis Data


Des 2006
GUIDELINES in designing tables 14

• GUIDELINE 1: Semantics of the Relation


Attributes  each tuple in a relation should
represent one entity or relationship instance.
• GUIDELINE 2: Design a schema that does not
suffer from the insertion, deletion and update
anomalies.
• GUIDELINE 3: Relations should be designed
such that their tuples will have as few NULL
values as possible
1.3 Null Values in Tuples 15

• GUIDELINE 3: Relations should be designed


such that their tuples will have as few NULL
values as possible
• Attributes that are NULL frequently could be placed
in separate relations (with the primary key)

• Multiple interpretation for nulls:
• attribute not applicable or invalid
• attribute value unknown (may exist)
• value known to exist, but unavailable
1.4 Spurious Tuples 16

• Bad designs for a relational database may


result in erroneous results for certain JOIN
operations
• The "lossless join" property is used to
guarantee meaningful results for join operations

• GUIDELINE 4: The relations should be designed


to satisfy the lossless join condition. No
spurious tuples should be generated by doing a
natural-join of any relations.
Poor design for EMP_PROJ relation

EMP_LOCS
ENAME PLOCATION

EMP_PROJ1
SSN PNUMBER HOURS PNAME PLOCATION

• Decomposing EMP_PROJ into EMP_LOCS and EMP_PROJ1


is undesirable because, when we JOIN them back using
NATURAL JOIN, we do not get the correct original
result.
Spurious Tuples (2) 18

• There are two important properties of


decompositions:
• non-additive or losslessness of the corresponding
join
• preservation of the functional dependencies.

• Note that property (a) is extremely important


and cannot be sacrificed. Property (b) is less
stringent and may be sacrificed.
GUIDELINES in designing tables 19

• GUIDELINE 1: Semantics of the Relation Attributes  each


tuple in a relation should represent one entity or relationship
instance.

• GUIDELINE 2: Design a schema that does not suffer from the


insertion, deletion and update anomalies.

• GUIDELINE 3: Relations should be designed such that their


tuples will have as few NULL values as possible.

• GUIDELINE 4: The relations should be designed to satisfy the


lossless join condition.
2.1 Functional Dependencies (1) 20

• Functional dependencies (FDs) are used to


specify formal measures of the "goodness" of
relational designs
• FDs and keys are used to define normal forms
for relations
• FDs are constraints that are derived from the
meaning and interrelationships of the data
attributes
• A set of attributes X functionally determines a
set of attributes Y if the value of X determines
a unique value for Y
Functional Dependencies (2) 21

• X  Y holds if whenever two tuples have the


same value for X, they must have the same
value for Y
X determines Y
• For any two tuples t1 and t2 in any relation
instance r(R): If t1[X]=t2[X], then t1[Y]=t2[Y]
• X  Y in R specifies a constraint on all relation
instances r(R)
• FDs are derived from the real-world
constraints on the attributes
Examples of FD constraints (1) 22

• social security number determines employee name


SSN  ENAME

• project number determines project name and location


PNUMBER  {PNAME, PLOCATION}

• employee ssn and project number determines the


hours per week that the employee works on the
project
{SSN, PNUMBER}  HOURS
Examples of FD constraints (2) 23

• An FD is a property of the attributes in the


schema R
• The constraint must hold on every relation
instance r(R)
• If K is a key of R, then K functionally
determines all attributes in R (since we never
have two distinct tuples with t1[K]=t2[K])
3 Normal Forms Based on
Primary Keys 24

3.1 Normalization of Relations


3.2 Practical Use of Normal Forms
3.3 Definitions of Keys and Attributes
Participating in Keys
3.4 First Normal Form
3.5 Second Normal Form
3.6 Third Normal Form
3.1 Normalization of Relations
(1) 25

• Normalization: The process of decomposing


unsatisfactory "bad" relations by breaking up
their attributes into smaller relations

• Normal form: Condition using keys and FDs of a


relation to certify whether a relation schema
is in a particular normal form
Normalization of Relations (2) 26

• 2NF, 3NF, BCNF based on keys and FDs of a


relation schema
• 4NF based on keys, multi-valued dependencies
: MVDs; 5NF based on keys, join
dependencies : JDs
• Additional properties may be needed to ensure
a good relational design (lossless join,
dependency preservation)
3.2 Practical Use of Normal
Forms 27

• Normalization is carried out in practice so that the resulting


designs are of high quality and meet the desirable
properties
• The practical utility of these normal forms becomes
questionable when the constraints on which they are based
are hard to understand or to detect
• The database designers need not normalize to the highest
possible normal form. (usually up to 3NF, BCNF or 4NF)

• Denormalization: the process of storing the join of higher


normal form relations as a base relation—which is in a lower
normal form
3.3 Definitions of Keys and
Attributes Participating in Keys (1) 28

• A superkey of a relation schema R = {A1,


A2, ...., An} is a set of attributes S subset-of R
with the property that no two tuples t1 and t2
in any legal relation state r of R will have t1[S]
= t2[S]

• A key K is a superkey with the additional


property that removal of any attribute from K
will cause K not to be a superkey any more.
• {SSN} is a key for EMPLOYEE
• {SSN}, {SSN, ENAME}, {SSN, ENAME, DNUM} and any
set of attributes that include SSN are all superkeys.
Definitions of Keys and Attributes
Participating in Keys (2) 29

• If a relation schema has more than one key,


each is called a candidate key. One of the
candidate keys is arbitrarily designated to be
the primary key, and the others are called
secondary keys.
• A Prime attribute must be a member of some
candidate key
• A Nonprime attribute is not a prime attribute—
that is, it is not a member of any candidate
key.

• In WORKS_ON: SSN and PNUMBER are prime


attributes, whereas other attributes are nonprime.
3.2 First Normal Form 30

• Disallows composite attributes, multivalued


attributes, and nested relations; attributes
whose values for an individual tuple are non-
atomic

• Considered to be part of the formal definition


of relation in the basic (flat) relational model
Normalization into 1NF 31

Do not allow
multi-valued

DEPARTMENT_1NF
Normalization into 1NF 32

• Three main techniques to achieve 1NF in the Figure


14.8 condition:
• Remove attribute DLOCATIONS and place it in a separate
DEPT_LOCATIONS along with the primary key DNUMBER of
DEPARTMENT

• Expand the key so that there will be a separate tuple in the


original DEPARTMENT for each location of DEPARTMENT. In this
case primary key becomes [DNUMBER, DLOCATION]

• If a maximum number of values is known for the attribute –


e.g. at most 3 locations – replace DLOCATION attribute by
three atomic attributes: DLOCATION1, DLOCATION2,
DLOCATION3
Normalization nested relations
into 1NF 33

• To normalize into 1NF:


• Remove the nested relation attributes into a new
relation

• Propagate the primary key into the new relations

SOAL:
3.3 Second Normal Form (1)

• Uses the concepts of FDs, primary key


• Definitions:
• Prime attribute - attribute that is member of the primary key
K
• Full functional dependency - a FD Y  Z where
removal of any attribute from Y means the FD does
not hold any more
• Examples:
• {SSN, PNUMBER}  HOURS is a full FD since neither SSN 
HOURS nor PNUMBER  HOURS hold
• {SSN, PNUMBER}  ENAME is not a full FD (it is called a
partial dependency ) since SSN  ENAME also holds
Second Normal Form (2) 35

• Definition:
• A relation schema R is in second normal form (2NF)
if every non-prime attribute A in R is fully
functionally dependent on the primary key

• R can be decomposed into 2NF relations via


the process of 2NF normalization

FD of not ??

35
Normalization into 2NF 36

• Test for 2NF:


• If primary key is a single attribut  do not need to
test.

• Every non-prime attribute A in R is fully FD on the


primary key

SOAL
Summary of previous meeting
1 NF and 2 NF 37

• 1 NF – disallow composite attributes, multivalued


attributes, and nested relations.
• eliminate composite attributes, multivalued attributes, and
nested relations (attributes whose values for an individual
tuple are non-atomic)
• 2 NF – every non-prime attribute A in R is fully
functionally dependent on the primary key
• eliminate partial dependenties
Normalization into 1NF 38

Do not allow
multi-valued

DEPARTMENT_1NF
Normalization into 2NF 39

• {DNUMBER, DLOCATION}  DNAME is not a full FD (it


is called a partial dependency ) since DNUMBER 
DNAME also holds
FD of not ??
3.4 Third Normal Form (1) 40

• Definition:
• Transitive functional dependency - a FD X  Z that
can be derived from two FDs X  Y and Y  Z
• Examples:
• SSN  DMGRSSN is a transitive FD since SSN 
DNUMBER and DNUMBER  DMGRSSN hold
• SSN  ENAME is non-transitive since there is no set
of attributes X where SSN  X and X  ENAME
4. General Normal Form
Definitions (For Multiple Keys) (1) 42

• The above definitions consider the primary key


only
• The following more general definitions take
into account relations with multiple candidate
keys
• A relation schema R is in second normal form
(2NF) if every non-prime attribute A in R is
fully functionally dependent on every key of R
General Normal Form
Definitions (2) 43

• Definition:
• Superkey of relation schema R - a set of attributes S
of R that contains a key of R
• A relation schema R is in third normal form (3NF) if
whenever a FD X  A holds in R, then either:
• (a) X is a superkey of R, or
• (b) A is a prime attribute of R
• NOTE: Boyce-Codd normal form disallows
condition (b) above
5. BCNF (Boyce-Codd Normal
Form) 44

• A relation schema R is in Boyce-Codd Normal


Form (BCNF) if whenever an FD X  A holds in
R, then X is a superkey of R
• Each normal form is strictly stronger than the
previous one
• Every 2NF relation is in 1NF
• Every 3NF relation is in 2NF
• Every BCNF relation is in 3NF
• There exist relations that are in 3NF but not in
BCNF
• The goal is to have each relation in BCNF (or
3NF)
A relation TEACH that is in 3NF
but not in BCNF 46
Achieving the BCNF by
Decomposition (1) 47

• Two FDs exist in the relation TEACH:


• FD1: { student, course}  instructor
• FD2: instructor  course
• {student, course} is a candidate key for this
relation and that the dependencies shown
follow the pattern in Figure 14.12 (b). So this
relation is in 3NF but not in BCNF
• A relation NOT in BCNF should be decomposed
so as to meet this property, while possibly
forgoing the preservation of all functional
dependencies in the decomposed relations.
Achieving the BCNF by
Decomposition (2) 48

• Three possible decompositions for relation


TEACH
• {student, instructor} and {student, course}
• {course, instructor } and {course, student}
• {instructor, course } and {instructor, student}
• All three decompositions will lose FD1. We
have to settle for sacrificing the functional
dependency preservation. But we cannot
sacrifice the non-additivity property after
decomposition.
• Out of the above three, only the 3rd
decomposition will not generate spurious
Exercise: 49

- Identify FD
- Illustrate the normalization process to 1NF, 2NF, 3NF
dan BCNF
staffNo dentistName patNo patName appointment surgeryNo
date time
S1011 Tony Smith P100 Gillian White 12-Sep-08 10.00 S15
S1011 Tony Smith P105 Jill Bell 12-Sep-08 12.00 S15
S1024 Helen Pearson P108 Ian MacKay 12-Sep-08 10.00 S10
S1024 Helen Pearson P108 Ian MacKay 14-Sep-08 14.00 S10
S1032 Robin Plevin P105 Jill Bell 14-Sep-08 16.30 S15
S1032 Robin Plevin P100 Gillian White 15-Sep-08 18.00 S13
Unnormalized
Not organized

1st NF
Disallows composite
attributes, multivalued
attributes, and nested
relations; attributes
whose values for an
individual tuple are non-
atomic

2nd NF
Every non-prime attribute
A in R is fully functionally
dependent on the primary
key

3rd NF

You might also like