0% found this document useful (0 votes)
24 views

UNIT 2 PART 1

Uploaded by

paarthchotani85
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

UNIT 2 PART 1

Uploaded by

paarthchotani85
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

CHAPTER 3

INTRODUCTION TO DATABASE
SYSTEM

❖ Characteristics of DBMS
❖ DBMS Architecture
❖ Components of DBMS
❖ ACID Properties
❖ People Interacting with the DBMS
❖ Advantages and Disadvantages of DBMS
❖ File System vs DBMS
❖ DBMS Models
❖ Codd’s Rule for Relational DBMS
❖ Normalization
INTRODUCTION TO DATABASE SYSTEM

Data is a raw fact and statistics stored or free flowing over a network, generally it's raw and
unprocessed. It is raw and unorganized facts that need to be processed. Data can be useless until
it is organized in some manner.
Data becomes information when it is processed, turning it into something meaningful. When data
is processed and organized and presented in a pre defined manner, so as to make it useful, it
becomes information. Information is created from the Data

Types of Data

Information

Processing

Fig No. 1 What is Information?

Difference between Data and Information


BASIS DATA INFORMATION
Meaning Data is raw facts and figures Information is a process form of data
What is it? It is just text and numbers. It is refined data.
Based on Records and Observations Analysis
Form Unorganized Organized
Useful Data does not help in decision Information helps in decision making
making.
Dependency Does not depend on information. Without data, information cannot be
processed.
A Database is a collection of related data organized in such a way that data can be easily
accessed, managed and updated and computer can easily find the desired information. Its main
motive is to store the data. In the starting Era of computer data was collected and stored in the
magnetic tapes. These tapes have some disadvantage like they are bulky in size and they required
more maintenance. This problem is solved by implementing a software based Database
Management System (DBMS). Database is a collection of data and Management System is a set
of programs to store and retrieve those data from the database.

A Database Management System (DBMS) is software that allows user to create, define and
process the data easily from the database. DBMS not only manage the database but also provide
the protection and security from the unauthorized access of the data from the database. And it
also maintains the consistency of the data in multiple users.
Some popular examples of DBMS:
MySql
Oracle
SQL Server
IBM DB2
PostgreSQL

Database

Database
Management
System (DBMS)

Management
System

Fig No. 2 What is DBMS?


3.1 Characteristics of Database Management System (DBMS)

Data Stored in
Tables

Backup & Reduced


Recovery Redundancy

DBMS Characteristics
ACID Properties Data Consistency

Query Language Multiple User &


Security Concurrent Access

Fig No. 3 Characteristics of DBMS

• Data stored into Tables: To store the data, database created the tables. Tables are just
the combination of rows and columns. To make the data more meaningful, DBMS allow
the relationship between the tables.

• Reduced Redundancy: DBMS reduce the unnecessary repetition of data in database.


Some time some data stored in the multiple places in the database. But DBMS
follows Normalization which divides the data in such a way that repetition is minimum.

• Data Consistency: DBMS makes the consistency of the data in the database. It is
basically implemented in the live data where data that is being continuously updated and
added.

• Support Multiple user and Concurrent Access: DBMS allows multiple users to work
on database (update, insert, and delete data) at the same time and still manages to
maintain the data consistency.
• Query Language: DBMS provides user’s friendly Query language, through which user
can easily retrieve, insert delete and update the data in the database.

• Security: it is the main characteristics of the DBMS. Database is the place where large
amount of data is stored. If that data can not secured than data can be access by the
unauthorized person and they can misuse that data. So, the DBMS takes care of the
security of data, protecting the data from un-authorized access by create user accounts
with different access permissions, using which we can easily secure our data by
restricting user access.

• Support ACID Properties: Any DBMS is able to support ACID (Accuracy,


Completeness, Isolation, and Durability) properties. DBMS sure that the real purpose of
data should not be lost while performing transactions like delete, insert an update.

• Backup and Recovery: There are many chances of failure of whole database. At that
time no one will be able to get the database back and for sure company will be in a big
loss. The only solution is to take backup of database and whenever it is needed, it can be
stored back. All the databases must have this characteristic.

3.2 DBMS Architecture

Database architecture is used to design software that can be used by an organization for its
businesses with various computer programming languages. Its main purpose is to focus on
design, development, implementation and maintenance of computer programs that store and
retrieve information for the business organizations. The architecture of a DBMS can be seen as 1
tier, 2 tier and 3 tier.

In One-tier architecture, the database is directly available to the user for using it to store or
access the data. Any modification can be done here directly on the DBMS itself. It does not
provide any application for the end-users. It is basically used for local application development
where programmers communicate directly with the database for quick response.
Fig No. 3 One Tier Architecture of DBMS
Source: https://medium.com/oceanize-geeks/concepts-of-database-architecture-dfdc558a93e4

Two-tier architecture uses the application layer between the user and the DBMS, which is
responsible to communicate with the users and request to the database management system and
then send the response from the DBMS to the user. The application tier is entirely independent of
the database in terms of operation, design, and programming.

Fig No. 4 Two Tier Architecture of DBMS


Source: https://medium.com/oceanize-geeks/concepts-of-database-architecture-dfdc558a93e4
Three-tier architecture separates its tiers from each other based on the complexity of the users
and how they use the data present in the database. It is the most widely used architecture to
design a DBMS.
Following are the three levels of 3-tier DBMS architecture:
Physical Level
Conceptual Level
External Level

Fig No. 5 Three Tier Architecture of DBMS


Source: https://www.tutorialride.com/dbms/three-level-architecture-of-dbms.htm

Physical Level: It describes the physical storage structure of data in database. It is also known as
Internal Level. This level is very close to physical storage of data. The internal schema defines
the various stored data types. It uses a physical data model.
Conceptual Level: Conceptual level describes the structure of the whole database for a group of
users. It is also called as the data model. Conceptual schema is a representation of the entire
content of the database. This schema contains all the information to build relevant external
records. It hides the internal details of physical storage.

External Level: External level is related to the data which is viewed by the end users. This level
includes a number of user views and is closest to the user. External view describes the segment
of the database that is required for a particular user group and hides the rest of the database from
that user group.

3.3 Components of DBMS

Hardware

Database Access
Software
Language
Components of
DBMS

Procedures Data

Fig No. 6 Components of DBMS

Hardware: Hardware means any physical component of computer like hard disk, input output
devices and memory to store the data. When DBMS run on computer, hard disk is used to store
the software, keyboard is using to provide the commands and RAM or ROM is required to
execute the commands.
Software: This is the main component, as this is the program which controls everything. The
DBMS software is capable of understanding the commands which execute on the database to
retrieve the data.

Data: Data is the main component of the DBMS. Because DBMS is designed for the data. The
main reason for create DBMS is to store and utilize the data as per requirement. Metadata is data
about the data. This is information stored by the DBMS to better understand the data stored in it.

Procedures: Procedures are referring general instructions to use the DBMS. This includes to
setup and install a DBMS, to login and logout of DBMS software, to manage databases, to take
backups, and generating reports etc.

Database Access Language: Database Access Language is a DBMS language which is designed
to write commands to access, insert, update and delete data stored in any database. A user can
write commands in the Database Access Language and submit it to the DBMS for execution,
which is then translated and executed by the DBMS. User can create new databases, tables, insert
data, fetch stored data, update data and delete the data using the access language.

3.4 ACID Properties

A transaction is a single logical unit of work which updates and retrieves the data from the
database. Transaction must maintain Atomicity, Consistency, Isolation, and Durability
commonly known as ACID properties in the database in order to ensure accuracy, completeness,
and data integrity of data in database.

Atomicity: This property states that either the entire transaction takes place at once or doesn’t
happen at all. There is no midway i.e. transactions do not occur partially. Atomicity involves
following two operations.
• Abort: If a transaction aborts than changes made to database through transaction are not
visible.
• Commit: If a transaction commits than changes made to database through transaction are
visible.
Consider the following transaction T consisting of T1 and T2: Transfer of 500 from account A to
account B.

Before A:500 B:700


Transaction T
T1 T2
Read(A) Read(B)
A=A-200 B=B+200
Write(A) Write(B)
After A :300 After B:900

T1 and T2 are the two task of the transaction T. If the transaction fails after completion of T1 but
before completion of T2 means after write(A) but before write(B)), then amount has been
deducted from A but not added to B. This results in an inconsistent database state. Therefore, the
transaction must be executed in entirety in order to ensure correctness of database state.

Consistency: The database must be maintained in state before and after the transaction. It
basically refers to correctness of the database.
From the above example: The total amount before and after the transaction must be maintained.
Total before T occurs = 500 + 700 = 1200.
Total after T occurs = 300 + 900 = 1200.
Therefore, database is consistent.
Inconsistency occurs in case T1 completes but T2 fails. As a result transaction T is incomplete.

Isolation: In a database system where more than one transaction are being executed
simultaneously and in parallel. This property ensures that multiple transactions can occur
concurrently without leading to inconsistency of database state. Changes through transaction will
not be visible until that particular change in that transaction has been committed.
Durability: The database should be durable enough to hold all its latest updates even if the
system fails or restarts. This property ensures that once the transaction has completed execution,
the updates and modifications to the database are stored in and written to disk and they persist
even is system failure occurs. These updates now become permanent and are stored in a non-
volatile memory. The effects of the transaction, thus, are never lost.

3.5 People Interacting with the DBMS

DBMS is used by various users for various reasons. Some may involve in designing database and
some involve in retrieving useful data from the database and some may involve in backing it up.

Database Administrators (DBA) Database Designers

People Interacting
with DBMS

End Users System Analyst

Fig No. 7 People Interacting with the DBMS

Database Administrators (DBA): Some users maintain the DBMS and they are responsible for
administrating the database. Administrators also look after DBMS resources like system license,
software application and tools required and other hardware related maintenance. DBA can be a
single person or it can be a group of person. Database Administrator is responsible for everything
that is related to database. He makes the policies, strategies and provides technical supports.
Some of the administrator responsibilities are:
• Interacting with the users of the system to understand what kind of data is to be stored in
the DBMS and how it is likely to be used
• Ensuring that unauthorized data access is not permitted by providing the permission to
the different users.
• Also ensuring the access of data to the users if the system fails, users can continue to
access as much of the uncorrupted data as possible
• Modifying the database to ensure adequate performance as user requirements change.

Database Designers: They are the group of people who actually works on designing part of
database. The actual database is started with requirement analysis followed by a good designing
process. They people keep a close watch on what data should be kept and in what format. They
identify and design the whole set of entities, relations, constraints and views. Designer writes
application programs that uses the database. These application programs are written in some
computer’s programming languages like COBOL, Java and fourth generation language. These
programs fulfill the user requirement and made according to user requirements. Retrieving
information, creating new information and changing existing information from the database is
done by these application programs.

End Users: End users are those persons who actually interact with the database system from the
terminal end. They use the developed applications and they don’t have any knowledge about the
design and working of database. Their main motive is to retrieve the useful information from the
database. There are basically two types of end users:
• Casual User: These users have great knowledge of query language. Casual users access
data by entering different queries from the terminal end. They do not write programs but
they can interact with the system by writing queries.

• Naïve: Any user who does not have any knowledge about database can be in this
category. There task is to just use the developed application and get the desired results.

System Analyst: He is responsible for the design, structure and properties of database. All the
requirements of the end users are handled by system analyst. Feasibility, economic and technical
aspects of DBMS is the main concern of system analyst.
3.6 Advantages and Disadvantages of DBMS

Advantages of DBMS Disadvantages of DBMS


•Data Sharing
• Complexity
•Data Security
•Controlled Redundancy • Size
•Enforced Standards • Performance
•Restricted Unauthorised Access • Failure Impact
•Backup & Recovery • Cost
•Concurrency Control

Fig No. 8 Advantages and Disadvantages of DBMS

Advantages of DBMS
• Data Sharing: In centralized DBMS, multiple applications can share the data.
• Improved Data Security: The more users access the data, the greater the risks of data
security breaches. Corporations invest considerable amounts of time, effort, and money to
ensure that corporate data are used properly. A DBMS provides a framework for better
enforcement of data privacy and security policies.
• Controlling Redundancy: in the file system, files cannot be shared between different
applications. So each application can maintain its own files which cause redundancy in
the stored data. But in centralized database this kind of redundancy can be avoided.
• Enforced Standards: Since DBMS is a central system, so standard can be implemented
easily. The standardized data is very helpful during integration or interchanging of data.
The file system is an independent system so standard cannot be easily enforced on
multiple independent applications.
• Restricted Unauthorized Access: in shared DBMS, multiple users share the data but all
of them not are authorized to access all information from the database.
• Providing Backup and Recovery: A DBMS must provide facilities for recovering from
hardware or software failures. The backup and recovery subsystem of the DBMS is
responsible for recovery.
• Concurrency Control: DBMS systems provide mechanisms to provide concurrent
access of data to multiple users.

Disadvantages of DBMS

• Complexity: Database users must understand the functionality of the DBMS. Because
the failure to understand the system can lead to bad design decisions, which can have
serious consequences for an organization.
• Size: The complexity and functionality makes the DBMS in large size and requiring
substantial amounts of memory to run efficiently.
• Performance: The DBMS is written for many applications rather than just one which
affect the performance of it.
• Higher impact of a failure: Due to centralization of resources, all users and applications
are depending on the DBMS. So the failure of any component can bring operations to a
halt.
• Cost of DBMS: The cost of DBMS is initially vary high and it also depends on the
environment and functionality provided by the DBMS. There is also the regular annual
maintenance cost.

3.7 File System versus DBMS

A File Management system is a DBMS that allows access to single files or tables at a time. In a
File System, data is directly stored in set of files. It contains flat files that have no relation to
other files
Basis File Management System Database Management System
Meaning File System is a general, easy-to- Database management system is
use system to store general files used when security constraints are
which require less security and high.
constraints.
Data Redundancy Data Redundancy is more in file Data Redundancy is less in database
management system. management system.
Data Inconsistency Data Inconsistency is more in file Data Inconsistency is less in
system. database management system.
Centralization Centralization is hard to get Centralization is achieved in
when it comes to File Database Management System.
Management System.
Position of the data User locates the physical address In Database Management System,
of the files to access data in File user is unaware of physical address
Management System. where data is stored.
Security Security is low in File Security is high in Database
Management System. Management System.
Types data File Management System stores Database Management System
unstructured data as isolated data stores structured data which have
files/entities. well defined constraints and
interrelation.

3.8 DBMS Models

A Database model defines the logical design and structure of a database and defines how data
will be stored, accessed and updated in a DBMS. It also defines how data is connected to each
other and how they are processed and stored inside the system.

Types of DBMS
Models

Entity
Hierarchical
Network Model Relationship Relational Model
Model
Model

Fig No. 9 DBMS Models


Hierarchical Model
In hierarchical database model data arrange into a tree-like-structure, with a single root, to which
all the other data is linked. The hierarchy starts from the Root data, and expands like a tree,
adding child nodes to the parent nodes. A child node will only have a single parent node. In
hierarchical model, data is organized into tree-like structure with one one-to-many relationship
between two different types of data, for example, one college can have many courses, many
professors with many students.

College
Database

Under
Post Graduate
Graduate
Courses
Courses

BBA BCA B.Com(H) MBA MCA

Fig No. 10 Hierarchical Model


Advantages

• Hierarchical model is simple and it is easy to design.


• Hierarchical model provides data integrity because it is based on parent-child
relationship. So, there is always a link between these nodes.

Disadvantages

• Hierarchical model is easy to design but it’s very difficult to implement.

• Operation anomalies are present in this model.


• This model is not flexible for dynamic requirements of the organization.
• Deletion of parent node results in deletion of child node forcefully.

• If the any change implement in the database structure, then to make necessary changes
in all the application programs that are use to access the data.

• Extra space is required for the storage of pointers.


Network Model
Network model expands on the hierarchical model by providing multiple paths among nodes i.e
more than one parent-child relationship. Hence this model allows one-to-one, one-to-many and
many-to-many relationships. With multiple paths in the data structure, it eliminates some of the
drawbacks of the hierarchical model but the network model is not very practically used.

University

Mathematics Computer Department


Department

Library Computer Lab

Fig No. 11 Network Model

Advantages

• Simple: network model is simple and easy to design.


• Capability to handle more relationship types: The network model can handle the one to
many and many to many relationships

• Easy to data access: The data access is easier and flexible as comparison to the
hierarchical model.

• Data integrity: In the network model, the nodes are not present without the link with
other nodes.

• Flexible: Network model is more flexible than hierarchical model for modification in the
database structure.
Disadvantages

• System complexity: All the records are maintained using pointers and hence the whole
database structure becomes very complex. It is quite complicated to maintain all the links
between the nodes. It is difficult to design and develop.

• Operational Anomalies: The insertion, deletion and updating operations of any data
require large number of pointers adjustments.

• Absence of structural independence: Structural changes to the database are very difficult.
• Extra space: Extra memory is required for storage of pointers.

• Time consuming: operation and maintenance on network model is time consuming and
expensive for large database.

Entity-Relationship Model
In the Entity-relationship model(ER Model) , relationships are defined by dividing the real object
into entity and its characteristics. E-R Models are used to represent the relationships among the
entities into pictorial form which make it easier to understand. ER Model is defined on the basis
of :
• Entities and their attributes.
• Relationships among entities.
Entity: An entity can be defined as a real-world object and that can be easily differentiated with
other objects. For example student. All the entities have some attributes or properties that give
them their identity.
Attributes: Entities are represented by means of their properties, called attributes. All attributes
have values. For example, in a school database, a student is considered as an entity. Student has
various attributes like name, age, class, etc.
Relationship : The logical association among entities is called relationship. Relationships are
mapped with entities in various ways. Mapping cardinalities define the number of association
between two entities. Mapping cardinalities are:-
One to One
One to Many
Many to One
Many to Many

Attribute Attribute
Attribute Attribute

Entity Relationship Entity

Attribute

Fig No. 12 Entity Relationship Model

Advantages

• Easy to understand due to pictorial representation.


• Easy to design.

• ER model is easy to implement.

Disadvantages

• Not easy to differentiate between entity and relationship.


• Some time Complex to understand due to pictorial representation.
• Not easy to differentiate into different type of attributes of entity.

Relational Model
Relational data model is the basic data model, which is widely used around the world by the
database designers for data storage and processing. Relational model can represent as a table
with columns and rows, where column represents attribute of an entity and rows represents
records or entity.
Relational Model Concepts
Attribute: Each column in a Table. Attributes are the properties which define a relation. e.g.,
Student_Rollno, NAME,etc.
Tables: In the Relational model the, relations are saved in the table format. It is stored along
with its entities. A table has two properties rows and columns. Rows represent records and
columns represent attributes.
Tuple : It is nothing but a single row of a table, which contains a single record.
Relation Schema: A relation schema represents the name of the relation with its attributes.
Degree: The total number of attributes which in the relation is called the degree of the relation.
Cardinality: Total number of rows present in the Table.
Column: The column represents the set of values for a specific attribute.
Relation instance: Relation instance is a finite set of tuples in the RDBMS system. Relation
instances never have duplicate tuples.
Relation key : Every row has one, two or multiple attributes, which is called relation key.
Attribute domain: Every attribute has some pre-defined value and scope which is known as
attribute domain

Table also called Relation

Enroll_No Stud_Name Course DOB

Tuple 101 A BCA 15/05/1980


Or
Row 102 B BBA 19/11/1984

103 C BCA 15/09/1977

Column or Attributes

Fig No. 13 Relational Model


Advantages

• Easy to use an understand:


• Very flexible.

• Widely used.
• Provides excellent support for dynamic queries.

• Users need not consider issues such as storage structure and access strategy.
• Specify control and authorization can be implemented more easily.
• Data independence is achieved more easily with normalization structure used in a
relational database.

Disadvantages

• For large databases, the performance in responding to queries is definitely


degraded.
• The processing requirements need to construct the indexes. So, the index position
of the file must be created and maintained along with the file records themselves.
• The file index must be searched sequentially before the actual file records are
obtained. This wastes time.

3.9 Codd’s Rule for Relational DBMS

E.F Codd was a Computer Scientist who invented the Relational model for Database
management and a database must obey in order to be regarded as a true relational database.
These rules can be applied on any database system that manages stored data using only its
relational capabilities.
Rule 1: Information Rule
All information stored in a database, may be user data or metadata is to be represented as stored
data in cells of tables. Everything in a database must be stored in a table format.
Rule 2: Guaranteed Access Rule
Every single data element is guaranteed to be accessible logically with: table-name + primary-
key (row value) + attribute-name (column value).
Rule 3: Systematic Treatment of NULL Values
The NULL values have several meanings, it can mean missing data, data is not known, not
applicable or no value. The NULL values in a database must be given a systematic and uniform
treatment.
Rule 4: Active Online Catalog
The description of the complete database must be stored in an online catalog, known as data
dictionary, which can be accessed by authorized users. The Catalog must be governed by same
rules as rest of the database. Users can use the same query language to access the catalog which
they use to access the database itself.
Rule 5: Powerful and Well-Structured Language
A database can only be accessed using a language having linear syntax that supports data
definition, data manipulation, and transaction management operations. So, one well structured
language must be there to provide all manners of access to the data stored in the database.
Example: SQL, etc. If the database allows access to the data without the use of this language,
then that is a violation.
Rule 6: View Updating Rule
All the views of a database, which can theoretically be updated, must also be updatable by the
system.
Rule 7: Relational Level Operation
A database must support Insert, Delete, and Update operations at each level of relations. This
must not be limited to a single row, that is, it must also support union, intersection and minus
operations to yield sets of data records.
Rule 8: Physical Data Independence
The data stored in a database must be independent of the applications that access the database.
Any change in the physical structure of a database must not have any impact on how the data is
being accessed by external applications.
Rule 9: Logical Data Independence
The logical data in a database must be independent of its user’s view (application). Any change
in logical data must not affect the applications using it. For example, if two tables are merged or
one is split into two different tables, there should be no impact or change on the user application.
This is one of the most difficult rule to apply.
Rule 10: Integrity Independence
A database must be independent of the application that uses it. All its integrity constraints can be
independently modified without the need of any change in the application. This rule makes a
database independent of the front-end application and its interface.
Rule 11: Distribution Independence
The end-user must not be able to see that the data is distributed over various locations. Users
should always get the impression that the data is located at one site only. This rule has been
regarded as the foundation of distributed database systems.
Rule 12: Non-Subversion Rule
If a system has an interface that provides access to low-level records, then the interface must not
be able to subvert the system and bypass security and integrity constraints.
Source: https://www.tutorialspoint.com/dbms/dbms_codds_rules.htm

3.10 Normalization

Database Normalization is the process of organizing the data in the database. Normalization is
used to minimize the redundancy from a relation or set of relations. It is also used to eliminate
the undesirable characteristics like Insertion, Update and Deletion Anomalies. Normalization
divides the larger table into the smaller table and links them using relationship. The normal form
is used to reduce redundancy from the database table. Normalization is used for mainly two
purposes:-
• Eliminating redundant (useless) data.
• Ensuring data dependencies make sense i.e. data is logically stored.

Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized. These are –
Insertion, updation and deletion anomaly. Anomalies can explain with an example:
Example: At some point the student table looks like this:
Emp_Id Emp_name Contact_No. Dept Dept_Id
101 A 1234523456 IT 1
102 B 9876876453 Marketing 2
101 C 7689626789 Developer 3
101 B 7894532765 IT 4

The above student table is not normalized. Now, we will see the problems that we are facing
when a table is not normalized.
Update anomaly: In the above table we have three rows for employee whose emp_id is 101 as
they belong to different Dept. If we want to update the contact_no of employee whose emp_Id is
101, we have three rows for that emp_id and they belong to different department then we have to
update the same in three rows or the data will become inconsistent.
Insert anomaly: Suppose a new employee joins the company, who is under training and
currently not assigned to any department then we would not be able to insert the data into the
table if Dept_id field doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the department marketing
then deleting the rows that are having dept_no. as 2 would also delete the information of
employee B.
To overcome these anomalies we need to normalize the data.

1NF

5NF 2NF
Types of
Normal
Forms
4NF 3NF

BCNF

Fig No. 14 Types of Normal Forms


Normal Description
Form
1NF A relation is in 1NF if it contains an atomic value.
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully
functional dependent on the primary key.
3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.
4NF A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-
valued dependency.
5NF A relation is in 5NF if it is in 4NF and not contains any join dependency and
joining should be lossless.

First Normal Form (1NF)


The first normal form expects to follow a few simple rules while designing database, and they
are:
A relation will be 1NF if it contains an atomic value.
It states that an attribute of a table cannot hold multiple values. It must hold only single-valued
attribute. First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.
Relation Departments not in 1NF because of multi-valued attribute in Dlocation.

Dname DNumber Dmgr_ssn Dlocation

Departments
Dname DNumber Dmgr_ssn Dlocation
Research 2 123 Punjab,New delhi, Banglore
Developer 1 456 Banglore
Analyst 3 789 New Delhi
Dname DNumber Dmgr_ssn Dlocation
Research 2 123 Punjab,
Research 2 123 New Delhi,
Research 2 123 Bangalore
Developer 1 456 Bangalore
Analyst 3 789 New Delhi

Second Normal Form (2NF)


Second normal form (2NF) is based on the concept of full functional dependency. A functional
dependency X Y is a full functional dependency if removal of any attribute A from X means that
the dependency does not hold any more; that is, for any attribute A that is element of X does not
functionally determine Y. So the 2NF follow the following rules:
• In the 2NF, relational must be in 1NF.
• In the second normal form, all non-key attributes are fully functional dependent on the
primary key.
SSN. PNumber Hours EName PName PLocation

FD 1

FD 2

FD 3

2NF Normalization

SSN PNumbe Hour SSN EName PNumbe PName PLocation


. r s . r

FD 1 FD 2 FD 3

Third Normal Form (3NF)


Third normal form (3NF) is based on the concept of transitive dependency. A functional
dependency X Y in a relation schema R is a transitive dependency if there is a set of
attributes Z that is neither a candidate key nor a subset of any key of R and both X Z and
Z Y holds. So the 3NF follow the following rules:
A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
3NF is used to reduce the data duplication. It is also used to achieve the data integrity. If there is
no transitive dependency for non-prime attributes, then the relation must be in third normal form.
A relation is in third normal form if it holds at least one of the following conditions for every
non-trivial function dependency X → Y.
X is a super key.
Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Emp_dept

Ename Ssn Bdate Address Dnumber Dname Dmgr_ssn

3NF Normalization

ED1 ED2

Ename Ssn Bdate Address Dnumber Dnumber Dname Dmgr_ssn

Boyce Codd normal form (BCNF)


BCNF is the advance version of 3NF. It is stricter than 3NF.
A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one department.
Employee table:
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO
264 India Designing D394 283
264 India Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549

In the above table Functional dependencies are as follows:


EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate key: {EMP-ID, EMP-DEPT}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
Emp_Country table:
EMP_ID EMP_COUNTRY
264 India
264 India

Emp_Dept Table:
EMP_DEPT DEPT_TYPE EMP_DEPT_NO
Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549

Emp_Dept_Mapping Table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549

Functional dependencies:
Emp_Id → Emp_Country
Emp_Dept → {Dept_Type, Emp_Dept_No}

Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of both the functional dependencies is a key.

Fourth Normal form (4NF)


A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation
will be a multi-valued dependency.
Example
Student
STU_ID COURSE HOBBY

21 Computer Dancing

21 Math Singing

34 Chemistry Dancing

74 Biology Cricket

59 Physics Hockey

The given student table is in 3NF, but the COURSE and HOBBY are two independent entity.
Hence, there is no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-valued
dependency on STU_ID, which leads to unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
Student_Course
STU_ID COURSE

21 Computer

21 Math
34 Chemistry

74 Biology

59 Physics

Student_Hobby
STU_ID HOBBY

21 Dancing

21 Singing

34 Dancing

74 Cricket

59 Hockey

Fifth Normal Form (5NF)


A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be
lossless.
5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid
redundancy. 5NF is also known as Project-join normal form (PJ/NF).

Example
SUBJECT LECTURER SEMESTER

Computer Anshika Semester 1

Computer John Semester 1

Math John Semester 1

Math Akash Semester 2

Chemistry Praveen Semester 1


In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take
Math class for Semester 2. In this case, combination of all these fields required to identify a valid
data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who will
be taking that subject so we leave Lecturer and Subject as NULL. But all three columns together
acts as a primary key, so we can't leave other two columns blank. So to make the above table into
5NF, we can decompose it into three relations P1, P2 & P3:
P1
SEMESTER SUBJECT

Semester 1 Computer

Semester 1 Math

Semester 1 Chemistry

Semester 2 Math

P2
SUBJECT LECTURER

Computer Anshika

Computer John

Math John

Math Akash

Chemistry Praveen

P3
SEMSTER LECTURER

Semester 1 Anshika
Semester 1 John

Semester 1 John

Semester 2 Akash

Semester 1 Praveen
SUMMARY

• Data is a raw fact and statistics stored or free flowing over a network, generally it's raw
and unprocessed.
• When data is processed and organized and presented in a pre defined manner, so as to
make it useful, it becomes information.
• A Database is a collection of related data organized in such a way that data can be easily
accessed, managed and updated and computer can easily find the desired information. Its
main motive is to store the data.
• A Database Management System (DBMS) is software that allows user to create, define
and process the data easily from the database.
• Database architecture is used to design software that can be used by an organization for
its businesses with various computer programming languages.
• Transaction must maintain Atomicity, Consistency, Isolation, and Durability commonly
known as ACID properties in the database in order to ensure accuracy, completeness, and
data integrity of data in database.
• A File Management system is a DBMS that allows access to single files or tables at a
time.
• A Database model defines the logical design and structure of a database and defines how
data will be stored, accessed and updated in a DBMS.
• Normalization is used to minimize the redundancy from a relation or set of relations. . It
is also used to eliminate the undesirable characteristics like Insertion, Update and
Deletion Anomalies.
REVIEW QUESTIONS

Q1.Define data and information along with examples.


Q2.Differentiate between data and information.
Q3.Explain the characteristics of DBMS
Q4.Discuss the DBMS architecture in detail.
Q5.Explain three tier architecture of DBMS in detail along with the diagram.
Q6.Explain the main components of DBMS.
Q7.What are ACID properties of DMBS? Discuss in detail with example.
Q8. Briefly discuss the role of the persons interacting with the DBMS software.
Q9.Enumerate the advantages and disadvantages of DBMS.
Q10. What is file system? How it differs from DBMS.
Q11. Explain briefly the various types of DBMS models.
Q12. Write short notes on
• Hierarchical Model
• Network Model
• Relational Model
• ER Model
Q13. Explain Codd rules in DBMS.
Q14. What is normalization? What are its types?
Q15. Explain 1NF and 2NF along with examples.
Q16. What is BCNF? How it differs from 3NF.
Q17. Explain 4NF and 5NF along with examples.

You might also like