UNIT 2 PART 1
UNIT 2 PART 1
INTRODUCTION TO DATABASE
SYSTEM
❖ Characteristics of DBMS
❖ DBMS Architecture
❖ Components of DBMS
❖ ACID Properties
❖ People Interacting with the DBMS
❖ Advantages and Disadvantages of DBMS
❖ File System vs DBMS
❖ DBMS Models
❖ Codd’s Rule for Relational DBMS
❖ Normalization
INTRODUCTION TO DATABASE SYSTEM
Data is a raw fact and statistics stored or free flowing over a network, generally it's raw and
unprocessed. It is raw and unorganized facts that need to be processed. Data can be useless until
it is organized in some manner.
Data becomes information when it is processed, turning it into something meaningful. When data
is processed and organized and presented in a pre defined manner, so as to make it useful, it
becomes information. Information is created from the Data
Types of Data
Information
Processing
A Database Management System (DBMS) is software that allows user to create, define and
process the data easily from the database. DBMS not only manage the database but also provide
the protection and security from the unauthorized access of the data from the database. And it
also maintains the consistency of the data in multiple users.
Some popular examples of DBMS:
MySql
Oracle
SQL Server
IBM DB2
PostgreSQL
Database
Database
Management
System (DBMS)
Management
System
Data Stored in
Tables
DBMS Characteristics
ACID Properties Data Consistency
• Data stored into Tables: To store the data, database created the tables. Tables are just
the combination of rows and columns. To make the data more meaningful, DBMS allow
the relationship between the tables.
• Data Consistency: DBMS makes the consistency of the data in the database. It is
basically implemented in the live data where data that is being continuously updated and
added.
• Support Multiple user and Concurrent Access: DBMS allows multiple users to work
on database (update, insert, and delete data) at the same time and still manages to
maintain the data consistency.
• Query Language: DBMS provides user’s friendly Query language, through which user
can easily retrieve, insert delete and update the data in the database.
• Security: it is the main characteristics of the DBMS. Database is the place where large
amount of data is stored. If that data can not secured than data can be access by the
unauthorized person and they can misuse that data. So, the DBMS takes care of the
security of data, protecting the data from un-authorized access by create user accounts
with different access permissions, using which we can easily secure our data by
restricting user access.
• Backup and Recovery: There are many chances of failure of whole database. At that
time no one will be able to get the database back and for sure company will be in a big
loss. The only solution is to take backup of database and whenever it is needed, it can be
stored back. All the databases must have this characteristic.
Database architecture is used to design software that can be used by an organization for its
businesses with various computer programming languages. Its main purpose is to focus on
design, development, implementation and maintenance of computer programs that store and
retrieve information for the business organizations. The architecture of a DBMS can be seen as 1
tier, 2 tier and 3 tier.
In One-tier architecture, the database is directly available to the user for using it to store or
access the data. Any modification can be done here directly on the DBMS itself. It does not
provide any application for the end-users. It is basically used for local application development
where programmers communicate directly with the database for quick response.
Fig No. 3 One Tier Architecture of DBMS
Source: https://medium.com/oceanize-geeks/concepts-of-database-architecture-dfdc558a93e4
Two-tier architecture uses the application layer between the user and the DBMS, which is
responsible to communicate with the users and request to the database management system and
then send the response from the DBMS to the user. The application tier is entirely independent of
the database in terms of operation, design, and programming.
Physical Level: It describes the physical storage structure of data in database. It is also known as
Internal Level. This level is very close to physical storage of data. The internal schema defines
the various stored data types. It uses a physical data model.
Conceptual Level: Conceptual level describes the structure of the whole database for a group of
users. It is also called as the data model. Conceptual schema is a representation of the entire
content of the database. This schema contains all the information to build relevant external
records. It hides the internal details of physical storage.
External Level: External level is related to the data which is viewed by the end users. This level
includes a number of user views and is closest to the user. External view describes the segment
of the database that is required for a particular user group and hides the rest of the database from
that user group.
Hardware
Database Access
Software
Language
Components of
DBMS
Procedures Data
Hardware: Hardware means any physical component of computer like hard disk, input output
devices and memory to store the data. When DBMS run on computer, hard disk is used to store
the software, keyboard is using to provide the commands and RAM or ROM is required to
execute the commands.
Software: This is the main component, as this is the program which controls everything. The
DBMS software is capable of understanding the commands which execute on the database to
retrieve the data.
Data: Data is the main component of the DBMS. Because DBMS is designed for the data. The
main reason for create DBMS is to store and utilize the data as per requirement. Metadata is data
about the data. This is information stored by the DBMS to better understand the data stored in it.
Procedures: Procedures are referring general instructions to use the DBMS. This includes to
setup and install a DBMS, to login and logout of DBMS software, to manage databases, to take
backups, and generating reports etc.
Database Access Language: Database Access Language is a DBMS language which is designed
to write commands to access, insert, update and delete data stored in any database. A user can
write commands in the Database Access Language and submit it to the DBMS for execution,
which is then translated and executed by the DBMS. User can create new databases, tables, insert
data, fetch stored data, update data and delete the data using the access language.
A transaction is a single logical unit of work which updates and retrieves the data from the
database. Transaction must maintain Atomicity, Consistency, Isolation, and Durability
commonly known as ACID properties in the database in order to ensure accuracy, completeness,
and data integrity of data in database.
Atomicity: This property states that either the entire transaction takes place at once or doesn’t
happen at all. There is no midway i.e. transactions do not occur partially. Atomicity involves
following two operations.
• Abort: If a transaction aborts than changes made to database through transaction are not
visible.
• Commit: If a transaction commits than changes made to database through transaction are
visible.
Consider the following transaction T consisting of T1 and T2: Transfer of 500 from account A to
account B.
T1 and T2 are the two task of the transaction T. If the transaction fails after completion of T1 but
before completion of T2 means after write(A) but before write(B)), then amount has been
deducted from A but not added to B. This results in an inconsistent database state. Therefore, the
transaction must be executed in entirety in order to ensure correctness of database state.
Consistency: The database must be maintained in state before and after the transaction. It
basically refers to correctness of the database.
From the above example: The total amount before and after the transaction must be maintained.
Total before T occurs = 500 + 700 = 1200.
Total after T occurs = 300 + 900 = 1200.
Therefore, database is consistent.
Inconsistency occurs in case T1 completes but T2 fails. As a result transaction T is incomplete.
Isolation: In a database system where more than one transaction are being executed
simultaneously and in parallel. This property ensures that multiple transactions can occur
concurrently without leading to inconsistency of database state. Changes through transaction will
not be visible until that particular change in that transaction has been committed.
Durability: The database should be durable enough to hold all its latest updates even if the
system fails or restarts. This property ensures that once the transaction has completed execution,
the updates and modifications to the database are stored in and written to disk and they persist
even is system failure occurs. These updates now become permanent and are stored in a non-
volatile memory. The effects of the transaction, thus, are never lost.
DBMS is used by various users for various reasons. Some may involve in designing database and
some involve in retrieving useful data from the database and some may involve in backing it up.
People Interacting
with DBMS
Database Administrators (DBA): Some users maintain the DBMS and they are responsible for
administrating the database. Administrators also look after DBMS resources like system license,
software application and tools required and other hardware related maintenance. DBA can be a
single person or it can be a group of person. Database Administrator is responsible for everything
that is related to database. He makes the policies, strategies and provides technical supports.
Some of the administrator responsibilities are:
• Interacting with the users of the system to understand what kind of data is to be stored in
the DBMS and how it is likely to be used
• Ensuring that unauthorized data access is not permitted by providing the permission to
the different users.
• Also ensuring the access of data to the users if the system fails, users can continue to
access as much of the uncorrupted data as possible
• Modifying the database to ensure adequate performance as user requirements change.
Database Designers: They are the group of people who actually works on designing part of
database. The actual database is started with requirement analysis followed by a good designing
process. They people keep a close watch on what data should be kept and in what format. They
identify and design the whole set of entities, relations, constraints and views. Designer writes
application programs that uses the database. These application programs are written in some
computer’s programming languages like COBOL, Java and fourth generation language. These
programs fulfill the user requirement and made according to user requirements. Retrieving
information, creating new information and changing existing information from the database is
done by these application programs.
End Users: End users are those persons who actually interact with the database system from the
terminal end. They use the developed applications and they don’t have any knowledge about the
design and working of database. Their main motive is to retrieve the useful information from the
database. There are basically two types of end users:
• Casual User: These users have great knowledge of query language. Casual users access
data by entering different queries from the terminal end. They do not write programs but
they can interact with the system by writing queries.
• Naïve: Any user who does not have any knowledge about database can be in this
category. There task is to just use the developed application and get the desired results.
System Analyst: He is responsible for the design, structure and properties of database. All the
requirements of the end users are handled by system analyst. Feasibility, economic and technical
aspects of DBMS is the main concern of system analyst.
3.6 Advantages and Disadvantages of DBMS
Advantages of DBMS
• Data Sharing: In centralized DBMS, multiple applications can share the data.
• Improved Data Security: The more users access the data, the greater the risks of data
security breaches. Corporations invest considerable amounts of time, effort, and money to
ensure that corporate data are used properly. A DBMS provides a framework for better
enforcement of data privacy and security policies.
• Controlling Redundancy: in the file system, files cannot be shared between different
applications. So each application can maintain its own files which cause redundancy in
the stored data. But in centralized database this kind of redundancy can be avoided.
• Enforced Standards: Since DBMS is a central system, so standard can be implemented
easily. The standardized data is very helpful during integration or interchanging of data.
The file system is an independent system so standard cannot be easily enforced on
multiple independent applications.
• Restricted Unauthorized Access: in shared DBMS, multiple users share the data but all
of them not are authorized to access all information from the database.
• Providing Backup and Recovery: A DBMS must provide facilities for recovering from
hardware or software failures. The backup and recovery subsystem of the DBMS is
responsible for recovery.
• Concurrency Control: DBMS systems provide mechanisms to provide concurrent
access of data to multiple users.
Disadvantages of DBMS
• Complexity: Database users must understand the functionality of the DBMS. Because
the failure to understand the system can lead to bad design decisions, which can have
serious consequences for an organization.
• Size: The complexity and functionality makes the DBMS in large size and requiring
substantial amounts of memory to run efficiently.
• Performance: The DBMS is written for many applications rather than just one which
affect the performance of it.
• Higher impact of a failure: Due to centralization of resources, all users and applications
are depending on the DBMS. So the failure of any component can bring operations to a
halt.
• Cost of DBMS: The cost of DBMS is initially vary high and it also depends on the
environment and functionality provided by the DBMS. There is also the regular annual
maintenance cost.
A File Management system is a DBMS that allows access to single files or tables at a time. In a
File System, data is directly stored in set of files. It contains flat files that have no relation to
other files
Basis File Management System Database Management System
Meaning File System is a general, easy-to- Database management system is
use system to store general files used when security constraints are
which require less security and high.
constraints.
Data Redundancy Data Redundancy is more in file Data Redundancy is less in database
management system. management system.
Data Inconsistency Data Inconsistency is more in file Data Inconsistency is less in
system. database management system.
Centralization Centralization is hard to get Centralization is achieved in
when it comes to File Database Management System.
Management System.
Position of the data User locates the physical address In Database Management System,
of the files to access data in File user is unaware of physical address
Management System. where data is stored.
Security Security is low in File Security is high in Database
Management System. Management System.
Types data File Management System stores Database Management System
unstructured data as isolated data stores structured data which have
files/entities. well defined constraints and
interrelation.
A Database model defines the logical design and structure of a database and defines how data
will be stored, accessed and updated in a DBMS. It also defines how data is connected to each
other and how they are processed and stored inside the system.
Types of DBMS
Models
Entity
Hierarchical
Network Model Relationship Relational Model
Model
Model
College
Database
Under
Post Graduate
Graduate
Courses
Courses
Disadvantages
• If the any change implement in the database structure, then to make necessary changes
in all the application programs that are use to access the data.
University
Advantages
• Easy to data access: The data access is easier and flexible as comparison to the
hierarchical model.
• Data integrity: In the network model, the nodes are not present without the link with
other nodes.
• Flexible: Network model is more flexible than hierarchical model for modification in the
database structure.
Disadvantages
• System complexity: All the records are maintained using pointers and hence the whole
database structure becomes very complex. It is quite complicated to maintain all the links
between the nodes. It is difficult to design and develop.
• Operational Anomalies: The insertion, deletion and updating operations of any data
require large number of pointers adjustments.
• Absence of structural independence: Structural changes to the database are very difficult.
• Extra space: Extra memory is required for storage of pointers.
• Time consuming: operation and maintenance on network model is time consuming and
expensive for large database.
Entity-Relationship Model
In the Entity-relationship model(ER Model) , relationships are defined by dividing the real object
into entity and its characteristics. E-R Models are used to represent the relationships among the
entities into pictorial form which make it easier to understand. ER Model is defined on the basis
of :
• Entities and their attributes.
• Relationships among entities.
Entity: An entity can be defined as a real-world object and that can be easily differentiated with
other objects. For example student. All the entities have some attributes or properties that give
them their identity.
Attributes: Entities are represented by means of their properties, called attributes. All attributes
have values. For example, in a school database, a student is considered as an entity. Student has
various attributes like name, age, class, etc.
Relationship : The logical association among entities is called relationship. Relationships are
mapped with entities in various ways. Mapping cardinalities define the number of association
between two entities. Mapping cardinalities are:-
One to One
One to Many
Many to One
Many to Many
Attribute Attribute
Attribute Attribute
Attribute
Advantages
Disadvantages
Relational Model
Relational data model is the basic data model, which is widely used around the world by the
database designers for data storage and processing. Relational model can represent as a table
with columns and rows, where column represents attribute of an entity and rows represents
records or entity.
Relational Model Concepts
Attribute: Each column in a Table. Attributes are the properties which define a relation. e.g.,
Student_Rollno, NAME,etc.
Tables: In the Relational model the, relations are saved in the table format. It is stored along
with its entities. A table has two properties rows and columns. Rows represent records and
columns represent attributes.
Tuple : It is nothing but a single row of a table, which contains a single record.
Relation Schema: A relation schema represents the name of the relation with its attributes.
Degree: The total number of attributes which in the relation is called the degree of the relation.
Cardinality: Total number of rows present in the Table.
Column: The column represents the set of values for a specific attribute.
Relation instance: Relation instance is a finite set of tuples in the RDBMS system. Relation
instances never have duplicate tuples.
Relation key : Every row has one, two or multiple attributes, which is called relation key.
Attribute domain: Every attribute has some pre-defined value and scope which is known as
attribute domain
Column or Attributes
• Widely used.
• Provides excellent support for dynamic queries.
• Users need not consider issues such as storage structure and access strategy.
• Specify control and authorization can be implemented more easily.
• Data independence is achieved more easily with normalization structure used in a
relational database.
Disadvantages
E.F Codd was a Computer Scientist who invented the Relational model for Database
management and a database must obey in order to be regarded as a true relational database.
These rules can be applied on any database system that manages stored data using only its
relational capabilities.
Rule 1: Information Rule
All information stored in a database, may be user data or metadata is to be represented as stored
data in cells of tables. Everything in a database must be stored in a table format.
Rule 2: Guaranteed Access Rule
Every single data element is guaranteed to be accessible logically with: table-name + primary-
key (row value) + attribute-name (column value).
Rule 3: Systematic Treatment of NULL Values
The NULL values have several meanings, it can mean missing data, data is not known, not
applicable or no value. The NULL values in a database must be given a systematic and uniform
treatment.
Rule 4: Active Online Catalog
The description of the complete database must be stored in an online catalog, known as data
dictionary, which can be accessed by authorized users. The Catalog must be governed by same
rules as rest of the database. Users can use the same query language to access the catalog which
they use to access the database itself.
Rule 5: Powerful and Well-Structured Language
A database can only be accessed using a language having linear syntax that supports data
definition, data manipulation, and transaction management operations. So, one well structured
language must be there to provide all manners of access to the data stored in the database.
Example: SQL, etc. If the database allows access to the data without the use of this language,
then that is a violation.
Rule 6: View Updating Rule
All the views of a database, which can theoretically be updated, must also be updatable by the
system.
Rule 7: Relational Level Operation
A database must support Insert, Delete, and Update operations at each level of relations. This
must not be limited to a single row, that is, it must also support union, intersection and minus
operations to yield sets of data records.
Rule 8: Physical Data Independence
The data stored in a database must be independent of the applications that access the database.
Any change in the physical structure of a database must not have any impact on how the data is
being accessed by external applications.
Rule 9: Logical Data Independence
The logical data in a database must be independent of its user’s view (application). Any change
in logical data must not affect the applications using it. For example, if two tables are merged or
one is split into two different tables, there should be no impact or change on the user application.
This is one of the most difficult rule to apply.
Rule 10: Integrity Independence
A database must be independent of the application that uses it. All its integrity constraints can be
independently modified without the need of any change in the application. This rule makes a
database independent of the front-end application and its interface.
Rule 11: Distribution Independence
The end-user must not be able to see that the data is distributed over various locations. Users
should always get the impression that the data is located at one site only. This rule has been
regarded as the foundation of distributed database systems.
Rule 12: Non-Subversion Rule
If a system has an interface that provides access to low-level records, then the interface must not
be able to subvert the system and bypass security and integrity constraints.
Source: https://www.tutorialspoint.com/dbms/dbms_codds_rules.htm
3.10 Normalization
Database Normalization is the process of organizing the data in the database. Normalization is
used to minimize the redundancy from a relation or set of relations. It is also used to eliminate
the undesirable characteristics like Insertion, Update and Deletion Anomalies. Normalization
divides the larger table into the smaller table and links them using relationship. The normal form
is used to reduce redundancy from the database table. Normalization is used for mainly two
purposes:-
• Eliminating redundant (useless) data.
• Ensuring data dependencies make sense i.e. data is logically stored.
Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized. These are –
Insertion, updation and deletion anomaly. Anomalies can explain with an example:
Example: At some point the student table looks like this:
Emp_Id Emp_name Contact_No. Dept Dept_Id
101 A 1234523456 IT 1
102 B 9876876453 Marketing 2
101 C 7689626789 Developer 3
101 B 7894532765 IT 4
The above student table is not normalized. Now, we will see the problems that we are facing
when a table is not normalized.
Update anomaly: In the above table we have three rows for employee whose emp_id is 101 as
they belong to different Dept. If we want to update the contact_no of employee whose emp_Id is
101, we have three rows for that emp_id and they belong to different department then we have to
update the same in three rows or the data will become inconsistent.
Insert anomaly: Suppose a new employee joins the company, who is under training and
currently not assigned to any department then we would not be able to insert the data into the
table if Dept_id field doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the department marketing
then deleting the rows that are having dept_no. as 2 would also delete the information of
employee B.
To overcome these anomalies we need to normalize the data.
1NF
5NF 2NF
Types of
Normal
Forms
4NF 3NF
BCNF
Departments
Dname DNumber Dmgr_ssn Dlocation
Research 2 123 Punjab,New delhi, Banglore
Developer 1 456 Banglore
Analyst 3 789 New Delhi
Dname DNumber Dmgr_ssn Dlocation
Research 2 123 Punjab,
Research 2 123 New Delhi,
Research 2 123 Bangalore
Developer 1 456 Bangalore
Analyst 3 789 New Delhi
FD 1
FD 2
FD 3
2NF Normalization
FD 1 FD 2 FD 3
3NF Normalization
ED1 ED2
Emp_Dept Table:
EMP_DEPT DEPT_TYPE EMP_DEPT_NO
Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549
Emp_Dept_Mapping Table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
Emp_Id → Emp_Country
Emp_Dept → {Dept_Type, Emp_Dept_No}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of both the functional dependencies is a key.
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given student table is in 3NF, but the COURSE and HOBBY are two independent entity.
Hence, there is no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-valued
dependency on STU_ID, which leads to unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
Student_Course
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
Student_Hobby
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Example
SUBJECT LECTURER SEMESTER
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
SUMMARY
• Data is a raw fact and statistics stored or free flowing over a network, generally it's raw
and unprocessed.
• When data is processed and organized and presented in a pre defined manner, so as to
make it useful, it becomes information.
• A Database is a collection of related data organized in such a way that data can be easily
accessed, managed and updated and computer can easily find the desired information. Its
main motive is to store the data.
• A Database Management System (DBMS) is software that allows user to create, define
and process the data easily from the database.
• Database architecture is used to design software that can be used by an organization for
its businesses with various computer programming languages.
• Transaction must maintain Atomicity, Consistency, Isolation, and Durability commonly
known as ACID properties in the database in order to ensure accuracy, completeness, and
data integrity of data in database.
• A File Management system is a DBMS that allows access to single files or tables at a
time.
• A Database model defines the logical design and structure of a database and defines how
data will be stored, accessed and updated in a DBMS.
• Normalization is used to minimize the redundancy from a relation or set of relations. . It
is also used to eliminate the undesirable characteristics like Insertion, Update and
Deletion Anomalies.
REVIEW QUESTIONS