DATABASE
Basic terminology:
Data:
◦ Data is raw unprocessed facts and figures that
have no context or purposeful meaning and
information is processed data that has meaning
and is presented in a context.
◦ For example, a computer operator may enter
306.41, which is data, because we do not know
why or in what context it is being used. However,
if this number then appears on a bill to show that
you owe a company NRs.306.41 for goods
received then this data has changed into
information, because it has acquired a context
(it’s a bill) and meaning.
Information
◦ Information is organized or classified data, which has
some meaningful values for the receiver. Information is
the processed data on which decisions and actions are
based.
DATA VS INFORMATION
DATA PROCESSING
Data processing is the re-structuring or re-ordering of data by
people or machine to increase their usefulness and add values for a
particular purpose.
The data processing system is used to include the resources such
as people, procedures, and devices that are used to accomplish the
processing of data for producing desirable output.
Data processing also refers to the complete check of the data
validity, accuracy, consistence, integrity, and its relevance to the
client’s requirements.
It means series of actions or operations, which convert data into
useful information.
It includes the conversion of raw data to machine-readable form,
flow of data through the CPU and memory to output devices, and
formatting or transformation of output.
It involves recording, analyzing, summarizing, calculating,
disseminating and sorting data.
Steps in data processing
◦ Preparation of source Document
◦ Input of data
◦ Manipulation of data
◦ Output of information
◦ Data storage
File Processing System(FPS)
File processing is the process of creating,
storing, and accessing content of files.
Files are used to store various documents.
All files are grouped based on their
categories. The file names are very related to
each other and arranged properly to easily
access the files. In file processing system, if
one needs to insert, delete, modify, store or
update data, one must know the entire
hierarchy of the files.
Database
• Collection of pieces of information that is
organized and used in computer management
system.
•It is an collection of interrelated data of certain
place person or things in such a way that it can be
easily access and process by an application program.
•It is a collection of logically related data that are
organized in such a way, so as to facilitate easy
accessing and processing of data.
Why need database?
A very large information, Needs for
information management
Data independence and efficient access.
Data integrity and security.
Uniform data administration.
Concurrent access, recovery from crashes.
Replication control
Reduced application development time.
No duplication.
Database Management system
It consists of a collection of interrelated
data and a set of application program to
access, update manage and can retrieving
data as requested.
it is a collection of interrelated data and a
set of programs that manage or control
the use of data. It provides an interface
between the database and its user.
COMPARISON BETWEEN FILE
PROCESSING SYSTEM & DBMS
A DBMS coordinates both the physical and
the logical access to the data where as a file-
processing system coordinates only the
physical access.
A DBMS reduces the amount of data
duplication by ensuring that a physical piece of
data is available to all programs authorized to
have access to it, whereas data written by one
program in a FPS may not be readable by
another program.
A DBMS is designed to allow flexible access to data (i.e.,
queries), whereas a FPS is designed to allow predetermined
access to data (i.e., compiled programs).
A DBMS is designed to coordinate multiple users accessing
the same data at the same time. A FPS is usually designed
to allow one or more programs to access different data
files at the same time. In a ,FPS a file can be accessed by
two programs concurrently only if both programs have
read-only access to the file.
Redundancy is control in DBMS, but not in file system
Unauthorized access is restricted in DBMS but not in file
system.
DBMS provide back up and recovery. When data is lost in
file system then it will not recover.
DBMS provide multiple user interfaces. Data is isolated in
file system,
Major role or Objectives of DBMS
Defines huge amount of storage for relevant
data.
Provides suitable mechanism for data access
and data manipulation.
Maintains system integrity and latest
modification update instansely.
Provides safety and security measures of
data from physical harm and unauthorized
access.
Provides mechanism for data sharing among
users concurrently.
Advantages of DBMS
Control of data redundancy
Data consistency
Data security
Sharing of data
Improved data integrity
Disadvantages
Cost of DBMS
Cost of maintainence is high
Higher impact of failure
Cost of staff training
Complex to understand and implement
Database model
Data model describes the structure of a
database.
◦ Database model-a collection of conceptual
tools for describing data relationships, data
semantics and consistency constraints.
Database model ----the methods of organising
data and represent the logical relationships
among data elements in a database.
Hierarchical Database model
this is one of the oldest type of database models.
In this model data is represented in the form of
records.
Each record or nodes has multiple
fields/attributes.
All records are arranged in database as tree like
structure.
The relationship between the records is called
parent child relationship in which any child record
relates to only a single parent type record.
It follows one – to – many relationship.
Advantages:- Disadvantages:-–
Conceptual simplicity Complex
Database security implementation
Data independence Difficult to manage
Database integrity Lacks structural
Efficiency independence
Complex applications
programming and use
Implementation
limitations
Lack of standards
Network database model:
it replaced hierarchical network database model due to
some limitations on the model.
It consists of collection of records which are inter-related
to each other with the help of relationship.
Suppose, if an employee relates to two departments, then
the hierarchical database model cannot able to arrange
records in proper place.
So network, database model was emerged to arranged
non-hierarchical database.
The structure of database is more like graph rather than
tree structure.
Structure of each node or record may have several
parents.
It follows many- to – many relationship
Advantages– Disadvantage
Conceptual simplicity Difficult for first time
Handles more users
relationship types System complexity
Data access flexibility Lack of structural
Promotes database independence
integrity Difficulties with
Data independence alterations of the
database because
when the information
entered can alter the
entire database
Relational database model:
In this model, data is organized in two-dimensional tables
which contains multiple rows and columns and the
relationship is maintained by storing a common field.
A row in a table represents a relationship among a set of
values.
This model was introduced by E.F Codd in 1970, and since
then it has been the most widely used database model, in
fact, we can say the only database model used around the
world.
The basic structure of data in the relational model is tables.
All the information related to a particular type is stored in
rows of that table.
Hence, tables are also known as relations in relational
model.
we can design tables, normalize them to reduce data
redundancy and by using the Structured Query language/
MS access program to access data from tables.
Cont…
All data elements within the database are
viewed as being stored in the form of simple
tables.
In this model, the data is organized into
tables which contain multiple
rows(tuple/record) and
columns(attribute/field). These tables are
called relations.
Since a table is a collection of such
relationships, it is generally referred to the
mathematical term relation, from which the
relational database model derives its name.
Advantages Disadvantages
Structural independence. Substantial hardware
Improved conceptual and system software
simplicity. overhead.
Easier database design, Can facilitate poor
implementation, design and
management, and use. implementation.
New technology May promote “islands
performance power and of information”
flexibility with multiple data problems.
requirement capabilities.
Data security(strong privacy)
Powerful database
management system.
Entity-Relationship model
This model is based on perception of a real
world that contains a collection of basic objects,
called entities and of relationship among these
objects and characteristics of an entity. It shows
relationship between different entities.
Entity
Entity set
Attributes
Relationship
◦ One-to-one
◦ One-to-many
◦ Many-to-many
E-R diagram
Diagrammatic representation of entities
attributes and relationship.
It show the relationship among different objects.
It was developed to facilitated database design
and the simplicity and pictorial clarity of this
diagramming technique have done great help in
the designing part of database.
Built up with following component:
◦ Rectangle (for entity)
◦ Oval or ellipse (its attribute)
◦ Diamond (relationship)
◦ line (connector)
Object oriented Database model
OODBMS are also called object databases or
object-oriented database management systems.
The data is represented and stored in the form of
objects.
In OOP, an entity is represented as an object and
objects are stored in memory.
Objects have members such as fields, properties,
and methods. Objects also have a life cycle that
includes the creation of an object, use of an
object, and deletion of an object.
OOP has key characteristics, encapsulation,
inheritance, and polymorphism. Today, there are
many popular OOP languages such as C++, Java,
C#, Ruby, Python, JavaScript, and Perl.
Terms used in RDMS
File : collection of identical records in an
entity set.
Record: the groups of values of the
attributes in predetermined order of an
instance of an entity set.
Field: valued stored in an attribute.
Domain:
◦ Data pool from which data entered into table for
DBMS.
◦ Set of possible value for a given attribute.
Tuples: the complete rows or record of the
table.
key
It is an attribute that is used to identify a
particular records in a database.
A key may be composed of more than
one attribute. Any attribute that is part of
a key is known as a Key attribute.
Keys in Database
Primary key-A primary is a column/field/attribute or set
of columns in a table that uniquely identifies tuples(rows) in
that table.
Super key – A super key is a set of one of more columns
(attributes) to uniquely identify rows in a table.
Candidate Key– A super key with no redundant attribute
is known as candidate key
Alternate Key– Out of all candidate keys, only one gets
selected as primary key, remaining keys are known as
alternate or secondary keys.
Composite Key– A key that consists of more than one
attribute to uniquely identify rows (also known as records
& tuples) in a table is called composite key.
Foreign Key – Foreign keys are the columns of a table
that points to the primary key of another table. They act as
a cross-reference between tables.
Entity, attribute and Relationship
Entity: an entity is the ‘thing’ or ‘object’ in the real
world that is distinguishable from other.
Represent by rectangle.
◦ Entity set: set of entities of same type that share
common properties or attributes.
Attribute: they are the properties or
characteristics possessed by entity. Represent by
ellipse.
Relationship: the association between entities. it is
an association among several entities and
represents meaningful dependencies between
them. Represented by diamond.
Database Administrator(DBA)
DBA is the most responsible person in an
organization with sound knowledge of DBMS.
He/she is the overall administrator of the system.
He / she is a person who is responsible for
designing, analyzing, evaluating, coordinating and
implementing the data base system.
He/she has the maximum amount of privileges
(permission to access the database) for accessing
the database, settings up system and defining the
role of the employees which use the system.
Responsibilities of DBA:
DBA defines data security, schemas, forms,
reports, relationships and user privileges.
DBA has responsibility to install, Monitor and
upgrade database server.
DBA provides different facilities for data
retrieving and making reports as required.
DBA has responsibility to maintain database
security, backup-recovery strategy, and
documentation of data recovery.
DBA supervises all the activities in the system:
addition, modification and deletion data from the
database.
Database related terms
Data security: it related to the protection of
database system from unauthorized access,
modifications or alteration, failure, loses, malicious
attack or destruction. It ensure protection of data
from data theft, data privacy, data corruption etc. of
the database.
Security measures at several level:
◦ Physical level(protection from fire, water, heat, dust, power
failure, amount of light, height of data store from surface,
from wet places, electrical circuit)
◦ Human level(protection from unauthorized access, only
authorized person should get access.)
◦ Operating system level(protection from accidential loss of
data, fully reliable OS should be used)
◦ Network level(protection from virus, hacking)
Data integrity constraints:
◦ It is a rule that restrict the values that may be present in the
database.
◦ Refers to the validity and consistency of data in database.
◦ Data integrity constraints(rule): data integrity is maintained by
using types of constraints.
Domain integrity constraints: it defines a set range of
data values for given specific data field. And also determines
whether null values are allowed or not in the data field.
Entity integrity constraints: it specify that all rows in a
table have a unique identifier, known as the primary key
value and it never be null i.e. blank. The rule that no column
that is part of primary key may accept null values.
Referential integrity constraints: it exists in a
relationship between the two tables in a database. It
ensures that the relationship between the primary keys in
the master table and foreign key in child table are always
maintained.
Data dictionary
Data dictionary: A data dictionary is a
file which contains meta-data that is data
about data. It also called information
system catalogue. It keeps all the data
information about the database system
such as location, size of the database,
tables, records, fields, user information,
recovery system, etc.
Data Abstraction
Developers hide certain details of how
the data are stored and maintained from
users through several level of abstraction
and only provide users with an abstract
view of data. This process is known as
Data Abstraction.
Level of data abstraction
Physical level:
◦ It is a lowest level. It describes how the data are
actually store. In this level ,low level data structure
are described in detail.
Logical level:
◦ It is the next higher level. It describes what data are
stored in the database and what relationship exists
among those data. The logical level of abstraction is
used by database administrator, who must decide
what information is to be kept in the database.
View level:
◦ It is the highest level of data abstraction. It describes
only the part of entire database. Users interact with
the system through view level.
Figure:
Data
Abstraction
Structured Query Language (SQL)
SQL was developed by IBM in the, 1970s as a way to get information into and out
of relational database management systems. It was first standardized in 1986 ANSI.
It is declarative in nature. That is its commands are accurate and declared so that
they perform on particular databases. SQL commands are categorized as:
DDL such as create a table, alter table, drop table, etc.
DML such as select, insert into, update, delete from, etc.
SQL is a declarative language. It is used to find the results of the database. SQL
queries are the most common in use. The SQL sublanguage DML and DDL are very
common in server-based database management system. For using SQL, you need to
create the table, insert data records into the table and then, you can manipulate the
data records or make the queries.
DDL(Data Definition Language): DDL is used by the database designers and
programmers to specify the content and structure of the table. It is used to define
the physical characteristics of records. It includes commands that manipulate the
structure of objects such as views, tables, and indexes, etc.
DML(Data Manupulation Language): DML is related with manipulation of
records such as retrieval, sorting, display and deletion of records of data. It helps
user to use query and display reports of the table. So it provides technique for
processing the database.
Types of database management
system
A distributed database is a database that
consists of two or more files located in
different sites either on the same network
or on entirely different networks.
Portions of the database are stored in
multiple physical locations and processing is
distributed among multiple database nodes.
A distributed database management system
(DDBMS) is a centralized software system
that manages a distributed database in a
manner as if it were stored in a single
location.
Centralized database system
The centralized database system consists of
a single processor together with its
associated data storage devices and other
peripherals.
It is physically confined to a single location.
Data can be accessed from the multiple sites
with the use of a computer network while
the database in maintained at the central
site.
In this system, customer can get access to
the centralized database and server from
different location using the network
structure.
Normalization
The process of breaking large table into large
numbers of small tables such that the redundancy is
minimized and data are manage in meaningful way.
The process to present database in a normal form to
avoid undesirable things such as repetition of
information, inability to represent information, loss of
information etc.
Normalization is a database design process in which
complex database table is broken down into simple
separate tables. It makes data model more flexible
and easier to maintain.
There are two goals of the normalization process:
eliminating redundant data and ensuring data
dependencies make sense.
Need for Normalization
Dependence between the data is
identified.
Redundancy in the database is minimized.
The data model is made more flexible and
easier to maintain.
Improve database design.
Removes anomalies for database
activities.
It avoids the loss of information.
First Normal Form (1NF)
For a table to be in the First Normal
Form, it should follow the following rules:
It should only have single(atomic) valued
attributes/columns.
Values stored in a column should be of
the same domain.
All the columns in a table should have
unique names.
Example of Normalization
Primary key= item+color
Second Normal Form (2NF)
For a table to be in the Second Normal
Form,
It should be in the First Normal form.
And, it should not have Partial
Dependency.
Third Normal Form (3NF)
A table is said to be in the Third Normal
Form when,
It is in the Second Normal form.
And, it doesn't have Transitive
Dependency.
A-B, B-C,C-A