Distributed
Distributed
Distributed Database-
Concepts and Design
1
Distributed Databases
A distributed database is a collection of
multiple interconnected databases, which are
spread physically across various locations that
communicate via a computer network.
It is single logical database physically divided
among networked computers
Distributed DBMS: is a centralized software
system that supports and manipulates
distributed databases.
It ensures that the data modified at any site is
universally updated.
2
Concepts
Distributed Database.
A logically interrelated collection of shared data
(and a description of this data), physically
distributed over a computer network.
Distributed DBMS.
Software system that permits the management of
the distributed database and makes the
distribution transparent to users.
Concept…
Users access the distributed database
via applications, which are classified as
those that do not require data from
other sites (local applications) and
those that do require data from other
sites (global applications). We require
a DDBMS to have at least one global
application.
Summary
Collection of logically-related shared data.
Data split into fragments.
Fragments may be replicated.
Fragments/replicas allocated to sites.
Sites linked by a communications network.
Data at each site is under control of a DBMS.
DBMSs handle local applications autonomously.
Each DBMS participates in at least one global
application.
Distributed Processing and Distributed
Database
Bahrdar
Mekelle
Adama
10 of
24
Distributed Database System (DDBS) Concept
Database
Database
Bardar
Mekelle
Database
Adama
• There must be different databases as opposed to a central
database
11 of
24
Features of Distributed database
Data is stored at a no. of sites
Sites are interconnected by a network
DDB is logically a single database
DDBMS has full functionality of DBMS.
Advantages of DDBMSs
Organizational Structure
Many organizations are naturally distributed
over several locations.
Shareability and Local Autonomy
The users at one site can access data stored at
other sites. Data can be placed at the site close
to the users who normally use that data. In this
way, users have local control of the data and
they can consequently establish and enforce
local policies regarding the use of this data.
A global database administrator (DBA) is
responsible for the entire system.
Advantages of DDBMSs
Improved Availability
In a centralized DBMS, a computer failure
terminates the operations of the DBMS.
However, a failure at one site of a DDBMS, or a
failure of a communication link making some
sites inaccessible, does not make the entire
system inoperable. Distributed DBMSs are
designed to continue to function despite such
failures. If a single node fails, the system may
be able to reroute the failed node’s requests to
another site.
Advantages of DDBMSs
Improved Reliability
As data may be replicated so that it exists at
more than one site, the failure of a node or a
communication link does not necessarily make
the data inaccessible.
Improved Performance
Since each site handles only a part of the
entire database, there may not be the same
contention for CPU and I/O services as
characterized by a centralized DBMS.
Advantages of DDBMSs
Economics
cost saving occurs where databases are
geographically remote and the applications
require access to distributed data. In such
cases, owing to the relative expense of data
being transmitted across the network as
opposed to the cost of local access, it may be
much more economical to partition the
application and perform the processing locally
at each site.
It is also much more cost-effective to add
workstations to a network than to update a
mainframe system.
Advantages of DDBMSs
Modular Growth
In a distributed environment, it is much easier
to handle expansion. New sites can be added to
the network without affecting the operations
of other sites. This flexibility allows an
organization to expand relatively easily.
Increasing database size can usually be handled
by adding processing and storage power to the
network. In a centralized DBMS, growth may
entail changes to both hardware (the
procurement of a more powerful system) and
software (the procurement of a more powerful
or more configurable DBMS).
Disadvantages of DDBMSs
Complexity
Cost
Security
Integrity Control More Difficult
Lack of Standards
Lack of Experience
Database Design More Complex
Types of DDBMS
Homogeneous DDBMS
Heterogeneous DDBMS
Homogeneous DDBMS
All sites use same DBMS product.
Much easier to design and manage.
Share a common global schema
Each site provides part of its autonomy in
terms of right to change schema or sw.
Approach provides incremental growth and
allows increased performance.:-making the
addition of a new site to the DDBMS easy
and allows increased performance by
exploiting the parallel processing capability
of multiple sites.
Heterogeneous DDBMS
Sites may run different DBMS products, with
possibly different underlying data models.
Occurs when sites have implemented their own
databases and integration is considered later.
Translations required to allow for:
Different hardware and different DBMS
products.
Heterogeneous DDBMS
For example, relations in the relational data model
are mapped to records and sets in the network
model.
It is also necessary to translate the query
language used (for example, SQL SELECT
statements are mapped to the network FIND and
GET statements).
Typical solution is to use gateways, which convert
the language and model of each different DBMS
into the language and model of the relational
system.
Distributed Database Design
Three key issues:
Fragmentation.
Allocation
Replication
Distributed Database Design
Fragmentation
Relationmay be divided into a number of sub-relations,
which are then distributed.
Allocation
Each fragment is stored at site with "optimal"
distribution.
Replication
Copy of fragment may be maintained at several sites.
Fragmentation
Definition and allocation of fragments
carried out strategically to achieve:
Localityof Reference
Improved Reliability and Availability
Improved Performance
Balanced Storage Capacities and Costs
Minimal Communication Costs.
41
Horizontal Fragmentation
P1 = type='House'(Rent)
P2 = type='Flat' (rRent)
a x a x
y b x
b c x
c b y
c y
c y
Divide
a1
bb3 bb1
a1 bb2
bb2
bb3
b1
cc1
Join
a1
bb1
a1 b1 c1 a1 b1
a1
b1 c1 b1
b2 c2 bb2
a2 a2 b2
Mixed Fragmentation
Mixed Fragmentation
40
Example - Mixed Fragmentation
S21 = branchNo='B003'(S2)
S22 = branchNo='B005'(S2)
S23 = branchNo='B007'(S2)
41
Transparencies in a DDBMS
Distribution Transparency
Fragmentation Transparency
Location Transparency
Replication Transparency
Local Mapping Transparency
Naming Transparency
Comparison of Strategies for Data
Distribution
33
Transparency
Transparency hides implementation details
from the user.
to make the use of the distributed
database equivalent to that of a
centralized database. We can identify four
main types of transparency in a DDBMS:
distributiontransparency;
transaction transparency;
performance transparency;
DBMS transparency.
Distribution Transparency
Distribution transparency allows the user
to perceive the database as a single, logical
entity. If a DDBMS exhibits distribution
transparency, then the user does not need
to know the data is fragmented
(fragmentation transparency) or the
location of data items (location
transparency).
Fragmentation transparency
Fragmentation is the highest level of
distribution transparency.
If fragmentation transparency is provided
by the DDBMS, then the user does not
need to know that the data is fragmented.
As a result, database accesses are based
on the global schema, so the user does not
need to specify fragment names or data
locations.
Location transparency
Location is the middle level of distribution
transparency. With location transparency,
the
user must know how the data has been
fragmented but still does not have to know
the location
of the data. The above query under
location transparency now becomes:
Naming transparency
As a corollary to the above distribution
transparencies, we have naming transparency.
Therefore, the DDBMS must ensure that no two
sites create a database object with the same
name. One solution to this problem is to create a
central name server, which has the responsibility
for ensuring uniqueness of all names in the system.
However, this approach results in:
loss of some local autonomy;
performance problems, if the central site becomes a
bottleneck;
low availability; if the central site fails, the remaining
sites cannot create any new database objects.
Naming transparency
An alternative solution is to prefix an object with
the identifier of the site that created it.
For example, the relation Branch created
at site S1 might be named S1.Branch.
Similarly, we need to be able to identify
each fragment and each of its copies. Thus,
copy 2 of fragment 3 of the Branch
relation created at site S1 might be
referred to as S1.Branch.F3.C2.
Replication transparency
Closely related to location transparency is
replication transparency, which means that
the user is unaware of the replication of
fragments. Replication transparency is
implied by location transparency. However,
it is possible for a system not to have
location transparency but to have
replication transparency.
Local mapping transparency
Local mapping transparency This is the
lowest level of distribution transparency.
With local mapping transparency, the user
needs to specify both fragment names and
the location of data items, taking into
consideration any replication that may
exist.
Distribution Transparency
Distribution transparency allows user to
perceive database as single, logical entity.
If DDBMS exhibits distribution
transparency, user does not need to know:
data is fragmented (fragmentation
transparency),
location of data items (location transparency),
otherwise call this local mapping transparency.
I/O cost;
CPU cost;
communication cost.
Data Allocation
Four alternative strategies regarding
placement of data:
Centralized
Partitioned(or Fragmented)
Complete Replication
Selective Replication
Data Allocation
Centralized
Consists of single database and DBMS stored
at one site with users distributed across the
network.
Partitioned
Database partitioned into disjoint fragments,
each fragment assigned to one site.
Data Allocation
Complete Replication
Consists of maintaining complete copy of
database at each site.
Selective Replication
Combination of partitioning, replication, and
centralization.
Data Replication
Data replication refers to the storage of
data copies at multiple sites served by a
computer network.
Fragment copies can be stored at several
sites to serve specific information
requirements.
The existence of fragment copies can
enhance data availability and response time,
reducing communication and total query
costs.
Figure 10.20
Data Replication
Mutual Consistency Rule
70
Commit Protocols
80
Performance Transparency
81
Performance Transparency
82
RECOVERY IN DDBS
-More complicated than in a centralized system. Failures related to
distributed DB
-log managers
Problems:
Difficult to know
-failure of site which had occurred
-failure of link
-loss of messages
if server is down, elect new server what about network partitioning?
Original
Server’s
Newly
Server
link elected
Server