Chapter-7 Distributed Database Systems
Chapter-7 Distributed Database Systems
Distributed
Database Systems
Outline
Distributed Database Concepts
Data Fragmentation, Replication and Allocation
Types of Distributed Database Systems
Query Processing
Concurrency Control
Centralized DB Distributed DB
physically.
Communication Cost: cost to send and
1. 10,000 records
2. each record is 100 bytes long
For DEPARTMENT we have the following information
1. 100 records
2. each record is 35 bytes long
Distributed Database System 29
…CON’T
There are three ways of executing this query:
1. Transfer DEPARTMENT and EMPLOYEE to S3 and perform
the join there: needs transfer of
10,000*100+100*35=1,003,500 byte.
2. Transfer the DEPARTMENT to S1, perform the join there
which will have 40*10,000 = 400,000 bytes and transfer
the result to S3. we need
1,000,000+400,000=1,400,000 byte to be transferred
3. Transfer the EMPLOYEE to S2, perform the join there
which will have 40*10,000 = 400,000 bytes and transfer
the result to S3. We need 3,500+400,000=403,500 byte
to be transferred.
Then one can select the strategy that will reduce the
data transfer cost for this specific query.
Other steps of optimization may also be included to 30
Distributed Database System
make the processing more efficient by reducing the
Transaction Management
Transaction is a logical unit of work constituted
by one or more operations executed by a single
user.
A transaction begins with the user's first
executable query statement and ends when it
is committed or rolled back.
A Distributed Transaction is a transaction
that includes one or more statements that,
individually or as a group, update data on two
or moreSQL distinct
Statement
nodes ofObje a distributed
Databa Domain
database. ct se
[email protected]; dept
SELECT * FROM sales midroc.telecom
Representation of Query in Distributed .et;
Distributed Database System 31
Database
…CON’T
There are two types of transaction in DDBMS to access data from other
sites:
1. Remote Transaction: contains only statements that access a single
remote node.
Remote Query statement is a query that selects information from one
or more remote tables, all of which reside at the same remote node or
site.
For example, the following query accesses data from the dept table in
the Addis schema (the site) of the remote sales database:
SELECT * FROM [email protected];
A remote update statement is an update that modifies data in one or
more tables, all of which are collocated at the same remote node.
For example, the following query updates the branch table in the Addis
schema of the remote sales database:
UPDATE Addis.dept@ sales.midroc.telecom.et;
SET loc = 'Arada'
WHERE BranchNo = 5;
Distributed Database System 32
…CON’T
2. Distributed Transaction: contains statements that access
more than one node.
A distributed query statement retrieves information from two
or more nodes.
If all statements of a transaction reference only a single
remote node, the transaction is remote, not distributed.
A database must guarantee that all statements in a
transaction, distributed or non-distributed, either commit or
roll back as a unit.
For example, the following query accesses data from the
local database as well as the remote sales database:
SELECT ename, dname
FROM Awassa.emp AW, Addis.dept@ sales.midroc.telecom.et
AD
WHERE AW.deptno = AD.deptno;
{Employee data is stored in Awassa and Sales data is stored
33
in Addis,
Distributed Databasethere
Systemis an employee responsible for each sale }
…CON’T
Remote query Remote Update
select update
client_nm [email protected]
from set
[email protected]; stu_id = '242'
Distributed query where stu_id = '200'
select
Distributed Update
project_name, student_nm update
[email protected]
from
set
[email protected]
stu_id = '242'
i, student s
Where stu_id = '200'
where
update student
s.stu_id = i.stu_id
Set stu_id = '242‘ where stu_id =
'200‘
Distributed Database System 34
commit
Concurrency Control
There are various techniques used for
concurrency control in centralized database
systems.
The techniques in distributed database
system are similar with the centralized
approach with additional implementation
requirements or modifications.
The main difference or the change that should
be incorporated is the way the lock manager
is implemented and how it functions.
There are different schemes for concurrency
control in DDBS
Distributed Database System 35
…CON’T
1) Non-Replicated Scheme
No data is replicated in the system
All sites will maintain a local lock manager (local lock
and unlock)
If site Si needs a lock on data in site Sj it send message
to lock manager of site Sj and the locking will be
handled by site Sj
All the locking and unlocking principles are handled by
the local lock manager in which the data object resides.
Is simple to implement
Need three message transfers
To request a lock
To notify grant of lock
To request unlock
Distributed Database System 36
…CON’T
2) Single Coordinate Approach
The system choose one single lock manager that
resides in one of the sites (Si)
All locks and unlocks requests are made at site Si
where the lock manager resides(Si)
Is simple to implement
Needs two message transfers
To request a lock
To request unlock
Simple deadlock handling
Could be a bottleneck since all processes are handled
at one site
Is vulnerable/at risk if the site with the lock manager
fails
Distributed Database System 37
…CON’T
There are three varieties of the 2PL (Two phase
locking) protocol in the DDBMS environment.
Implementing the basic 2PL in distributed
systems assumes that data is distributed
across multiple machines.
Centralized 2PL:
A single site responsible for granting and
releasing locks
Each site’s transaction manager communicates
with this centralized lock manager and with its
own local data manager
Has only one lock manager for the entire site.
Distributed Database System 38
…CON’T
Primary 2PL:
Each replicated data item is assigned a primary copy;
the lock manager on that primary copy is responsible for granting
and releasing locks(distributed locking) and updates are
propagated as soon as possible to the slave copies
Distributes the lock manager to a number of sites
Distributed 2PL:
Assumes data is completely replicated
The schedulers (Lock managers) at each site are
responsible in granting and releasing locks as well as
forwarding operations to the local data manager.
Distributes the lock manager to every site in the DDBS.
Has communication overhead than the others.