0% found this document useful (0 votes)
14 views

04_Distributed DBMSs - Concepts and Design

Uploaded by

kelvinmakala2015
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

04_Distributed DBMSs - Concepts and Design

Uploaded by

kelvinmakala2015
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

Distributed DBMSs - Concepts and Design

1
Chapter 24 - Objectives

• Concepts.
• Advantages and disadvantages of
distributed databases.
• Functions and architecture for a
DDBMS.
• Distributed database design.
• Levels of transparency.
• Comparison criteria for DDBMSs.
2
Concepts

Distributed Database
A logically interrelated collection of
shared data (and a description of this
data), physically distributed over a
computer network.

Distributed DBMS
Software system that permits the
management of the distributed database
and makes the distribution transparent to
users.
3
Concepts

• Collection of logically-related shared data.


• Data split into fragments.
• Fragments may be replicated.
• Fragments/replicas allocated to sites.
• Sites linked by a communications network.
• Data at each site is under control of a DBMS.
• DBMSs handle local applications
autonomously.
• Each DBMS participates in at least one
global application.
4
Distributed DBMS

5
Distributed Processing

A centralized database that can be


accessed over a computer network.

6
Parallel DBMS
A DBMS running across multiple processors
and disks designed to execute operations in
parallel, whenever possible, to improve
performance.
• Based on premise that single processor
systems can no longer meet requirements
for cost-effective scalability, reliability, and
performance.
• Parallel DBMSs link multiple, smaller
machines to achieve same throughput as
single, larger machine, with greater
scalability and reliability.
7
Parallel DBMS

• Main architectures for parallel DBMSs


are:
– Shared memory,
– Shared disk,
– Shared nothing.

8
Parallel DBMS

(a) shared
memory

(b) shared disk

(c) shared
nothing

9
Advantages of DDBMSs

• Reflects organizational structure


• Improved shareability and local
autonomy
• Improved availability
• Improved reliability
• Improved performance
• Economics
• Modular growth
10
Disadvantages of DDBMSs

• Complexity
• Cost
• Security
• Integrity control more difficult
• Lack of standards
• Lack of experience
• Database design more complex

11
Types of DDBMS

• Homogeneous DDBMS
• Heterogeneous DDBMS

12
Homogeneous DDBMS

• All sites use same DBMS product.


• Much easier to design and manage.
• Approach provides incremental
growth and allows increased
performance.

13
Heterogeneous DDBMS

• Sites may run different DBMS products,


with possibly different underlying data
models.
• Occurs when sites have implemented their
own databases and integration is
considered later.
• Translations required to allow for:
– Different hardware.
– Different DBMS products.
– Different hardware and different DBMS
products.
• Typical solution is to use gateways.
14
Open Database Access and Interoperability

• Open Group formed a Working Group to


provide specifications that will create a
database infrastructure environment where
there is:
– Common SQL API that allows client applications to be
written that do not need to know vendor of DBMS they are
accessing.
– Common database protocol that enables DBMS from one
vendor to communicate directly with DBMS from another
vendor without the need for a gateway.
– A common network protocol that allows communications
between different DBMSs.

15
Open Database Access and Interoperability

• Most ambitious goal is to find a way


to enable transaction to span DBMSs
from different vendors without use of
a gateway.
• Group has now evolved into DBIOP
Consortium and are working in
version 3 of DRDA (Distributed
Relational Database Architecture)
standard.
16
Multidatabase System (MDBS)

DDBMS in which each site maintains


complete autonomy.
• DBMS that resides transparently on top of
existing database and file systems and
presents a single database to its users.
• Allows users to access and share data
without requiring physical database
integration.
• Unfederated MDBS (no local users) and
federated MDBS.
17
Overview of Networking

Network - Interconnected collection


of autonomous computers, capable of
exchanging information.
• Local Area Network (LAN) intended for
connecting computers at same site.
• Wide Area Network (WAN) used when
computers or LANs need to be connected over
long distances.
• WAN relatively slow and less reliable than
LANs. DDBMS using LAN provides much
faster response time than one using WAN.
18
Overview of Networking

19
Functions of a DDBMS

• Expect DDBMS to have at least the


functionality of a DBMS.
• Also to have following functionality:
– Extended communication services.
– Extended Data Dictionary.
– Distributed query processing.
– Extended concurrency control.
– Extended recovery services.

20
Reference Architecture for DDBMS
• Due to diversity, no accepted
architecture equivalent to ANSI/SPARC
3-level architecture.
• A reference architecture consists of:
– Set of global external schemas.
– Global conceptual schema (GCS).
– Fragmentation schema and allocation schema.
– Set of schemas for each local DBMS conforming
to 3-level ANSI/SPARC.
• Some levels may be missing,
depending on levels of transparency
supported.
21
Reference Architecture for DDBMS

22
Reference Architecture for MDBS

• In DDBMS, GCS is union of all local


conceptual schemas.
• In FMDBS, GCS is subset of local
conceptual schemas (LCS), consisting
of data that each local system agrees
to share.
• GCS of tightly coupled system
involves integration of either parts of
LCSs or local external schemas.
• FMDBS with no GCS is called loosely
coupled. 23
Reference Architecture for Tightly-Coupled
FMDBS

24
Components of a DDBMS

25
Distributed Database Design

• Three key issues:


– Fragmentation,
– Allocation,
– Replication.

26
Distributed Database Design

Fragmentation
Relation may be divided into a number of
sub-relations, which are then distributed.
Allocation
Each fragment is stored at site with
“optimal” distribution.
Replication
Copy of fragment may be maintained at
several sites.

27
Fragmentation

• Definition and allocation of fragments


carried out strategically to achieve:
– Locality of Reference.
– Improved Reliability and Availability.
– Improved Performance.
– Balanced Storage Capacities and Costs.
– Minimal Communication Costs.
• Involves analyzing most important
applications, based on
quantitative/qualitative information.
28
Fragmentation
• Quantitative information may include:
– frequency with which an application is
run;
– site from which an application is run;
– performance criteria for transactions and
applications.

• Qualitative information may include


transactions that are executed by
application, type of access (read or
write), and predicates of read
operations.
29
Data Allocation

• Four alternative strategies regarding


placement of data:
– Centralized,
– Partitioned (or Fragmented),
– Complete Replication,
– Selective Replication.

30
Data Allocation
Centralized: Consists of single database and
DBMS stored at one site with users
distributed across the network.
Partitioned: Database partitioned into disjoint
fragments, each fragment assigned to one
site.
Complete Replication: Consists of
maintaining complete copy of database at
each site.
Selective Replication: Combination of
partitioning, replication, and centralization.
31
Comparison of Strategies for Data
Distribution

32
Why Fragment?

• Usage
– Applications work with views rather than
entire relations.
• Efficiency
– Data is stored close to where it is most
frequently used.
– Data that is not needed by local
applications is not stored.

33
Why Fragment?

• Parallelism
– With fragments as unit of distribution,
transaction can be divided into several
subqueries that operate on fragments.
• Security
– Data not required by local applications is
not stored and so not available to
unauthorized users.

34
Why Fragment?

• Disadvantages
– Performance,
– Integrity.

35
Correctness of Fragmentation

• Three correctness rules:


– Completeness,
– Reconstruction,
– Disjointness.

36
Correctness of Fragmentation
Completeness
If relation R is decomposed into fragments R1,
R2, ... Rn, each data item that can be found in R
must appear in at least one fragment.

Reconstruction
• Must be possible to define a relational
operation that will reconstruct R from the
fragments.
• Reconstruction for horizontal fragmentation
is Union operation and Join for vertical .

37
Correctness of Fragmentation

Disjointness
• If data item di appears in fragment Ri, then it
should not appear in any other fragment.
• Exception: vertical fragmentation, where
primary key attributes must be repeated to
allow reconstruction.
• For horizontal fragmentation, data item is a
tuple.
• For vertical fragmentation, data item is an
attribute.
38
Types of Fragmentation

• Four types of fragmentation:


– Horizontal,
– Vertical,
– Mixed,
– Derived.

• Other possibility is no fragmentation:


– If relation is small and not updated
frequently, may be better not to fragment
relation. 39
Horizontal and Vertical Fragmentation

40
Mixed Fragmentation

41
Horizontal Fragmentation

• Consists of a subset of the tuples of a


relation.
• Defined using Selection operation of
relational algebra:
p(R)

• For example:
P1 =  type=‘House’(PropertyForRent)
P2 =  type=‘Flat’(PropertyForRent)
42
Horizontal Fragmentation

• This strategy is determined by looking at


predicates used by transactions.
• Involves finding set of minimal (complete
and relevant) predicates.
• Set of predicates is complete, if and only if,
any two tuples in same fragment are
referenced with same probability by any
application.
• Predicate is relevant if there is at least one
application that accesses fragments
differently. 43
Vertical Fragmentation

• Consists of a subset of attributes of a


relation.
• Defined using Projection operation of
relational algebra:
a1, ... ,an(R)
• For example:
S1 = staffNo, position, sex, DOB, salary(Staff)
S2 = staffNo, fName, lName, branchNo(Staff)
• Determined by establishing affinity of
one attribute to another.
44
Mixed Fragmentation

• Consists of a horizontal fragment that


is vertically fragmented, or a vertical
fragment that is horizontally
fragmented.
• Defined using Selection and Projection
operations of relational algebra:
 p(a1, ... ,an(R)) or
a1, ... ,an(σp(R))
45
Example - Mixed Fragmentation

S1 = staffNo, position, sex, DOB, salary(Staff)


S2 = staffNo, fName, lName, branchNo(Staff)

S21 =  branchNo=‘B003’(S2)
S22 =  branchNo=‘B005’(S2)
S23 =  branchNo=‘B007’(S2)

46
Derived Horizontal Fragmentation

• A horizontal fragment that is based on


horizontal fragmentation of a parent
relation.
• Ensures that fragments that are
frequently joined together are at same
site.
• Defined using Semijoin operation of
relational algebra:

Ri = R F Si, 1iw 47
Example - Derived Horizontal Fragmentation

S3 =  branchNo=‘B003’(Staff)
S4 =  branchNo=‘B005’(Staff)
S5 =  branchNo=‘B007’(Staff)

Could use derived fragmentation for


Property:

Pi = PropertyForRent branchNo Si, 3  i  5

48
Derived Horizontal Fragmentation

• If relation contains more than one


foreign key, need to select one as
parent.
• Choice can be based on
fragmentation used most frequently
or fragmentation with better join
characteristics.

49
Distributed Database Design Methodology

1. Use normal methodology to produce a design for


the global relations.
2. Examine topology of system to determine where
databases will be located.
3. Analyze most important transactions and identify
appropriateness of horizontal/vertical
fragmentation.
4. Decide which relations are not to be fragmented.
5. Examine relations on 1 side of relationships and
determine a suitable fragmentation schema.
Relations on many side may be suitable for
derived fragmentation.
50
Transparencies in a DDBMS

• Distribution Transparency
– Fragmentation Transparency
– Location Transparency
– Replication Transparency
– Local Mapping Transparency
– Naming Transparency

51
Transparencies in a DDBMS

• Transaction Transparency
– Concurrency Transparency
– Failure Transparency

• Performance Transparency
– DBMS Transparency

• DBMS Transparency

52
Distribution Transparency
• Distribution transparency allows user to
perceive database as single, logical entity.
• If DDBMS exhibits distribution transparency,
user does not need to know:
– data is fragmented (fragmentation transparency),
– location of data items (location transparency),
– otherwise call this local mapping transparency.
• With replication transparency, user is
unaware of replication of fragments .

53
Naming Transparency

• Each item in a DDB must have a unique


name.
• DDBMS must ensure that no two sites
create a database object with same name.
• One solution is to create central name
server. However, this results in:
– loss of some local autonomy;
– central site may become a bottleneck;
– low availability; if the central site fails, remaining
sites cannot create any new objects.

54
Naming Transparency

• Alternative solution - prefix object with


identifier of site that created it.
• For example, Branch created at site S1 might be
named S1.BRANCH.
• Also need to identify each fragment and its
copies.
• Thus, copy 2 of fragment 3 of Branch created at
site S1 might be referred to as
S1.BRANCH.F3.C2.
• However, this results in loss of distribution
transparency.
55
Naming Transparency

• An approach that resolves these


problems uses aliases for each
database object.
• Thus, S1.BRANCH.F3.C2 might be
known as LocalBranch by user at site
S1.
• DDBMS has task of mapping an alias to
appropriate database object.

56
Transaction Transparency

• Ensures that all distributed transactions


maintain distributed database’s integrity and
consistency.
• Distributed transaction accesses data stored
at more than one location.
• Each transaction is divided into number of
subtransactions, one for each site that has to
be accessed.
• DDBMS must ensure the indivisibility of both
the global transaction and each of the
subtransactions. 57
Example - Distributed Transaction

• T prints out names of all staff, using


schema defined above as S1, S2, S21,
S22, and S23. Define three
subtransactions TS3, TS5, and TS7 to
represent agents at sites 3, 5, and 7.

58
Concurrency Transparency

• All transactions must execute independently and


be logically consistent with results obtained if
transactions executed one at a time, in some
arbitrary serial order.
• Same fundamental principles as for centralized
DBMS.
• DDBMS must ensure both global and local
transactions do not interfere with each other.
• Similarly, DDBMS must ensure consistency of all
subtransactions of global transaction.

59
Classification of Transactions

• In IBM’s Distributed Relational


Database Architecture (DRDA), four
types of transactions:
– Remote request
– Remote unit of work
– Distributed unit of work
– Distributed request.

60
Classification of Transactions

61
Concurrency Transparency

• Replication makes concurrency more


complex.
• If a copy of a replicated data item is
updated, update must be propagated to all
copies.
• Could propagate changes as part of original
transaction, making it an atomic operation.
• However, if one site holding copy is not
reachable, then transaction is delayed until
site is reachable.

62
Concurrency Transparency

• Could limit update propagation to


only those sites currently available.
Remaining sites updated when they
become available again.
• Could allow updates to copies to
happen asynchronously, sometime
after the original update. Delay in
regaining consistency may range
from a few seconds to several hours.
63
Failure Transparency
• DDBMS must ensure atomicity and
durability of global transaction.
• Means ensuring that subtransactions of
global transaction either all commit or all
abort.
• Thus, DDBMS must synchronize global
transaction to ensure that all
subtransactions have completed
successfully before recording a final
COMMIT for global transaction.
• Must do this in presence of site and network
failures.

64
Performance Transparency

• DDBMS must perform as if it were a


centralized DBMS.
– DDBMS should not suffer any
performance degradation due to
distributed architecture.
– DDBMS should determine most cost-
effective strategy to execute a request.

65
Performance Transparency

• Distributed Query Processor (DQP)


maps data request into ordered
sequence of operations on local
databases.
• Must consider fragmentation,
replication, and allocation schemas.
• DQP has to decide:
– which fragment to access;
– which copy of a fragment to use;
– which location to use.
66
Performance Transparency

• DQP produces execution strategy


optimized with respect to some cost
function.
• Typically, costs associated with a
distributed request include:
– I/O cost;
– CPU cost;
– communication cost.
67
Performance Transparency - Example
Property(propNo, city) 10000 records in London
Client(clientNo,maxPrice) 100000 records in
Glasgow
Viewing(propNo, clientNo) 1000000 records in
London

SELECT p.propNo
FROM Property p INNER JOIN
(Client c INNER JOIN Viewing v ON c.clientNo =
v.clientNo)
ON p.propNo = v.propNo
WHERE p.city=‘Aberdeen’ AND c.maxPrice > 200000;
68
Performance Transparency - Example

Assume:
• Each tuple in each relation is 100
characters long.
• 10 renters with maximum price
greater than £200,000.
• 100 000 viewings for properties in
Aberdeen.
• Computation time negligible
compared to communication time. 69
Performance Transparency - Example

70
Date’s 12 Rules for a DDBMS

0. Fundamental Principle
To the user, a distributed system should look
exactly like a nondistributed system.
1. Local Autonomy
2. No Reliance on a Central Site
3. Continuous Operation
4. Location Independence
5. Fragmentation Independence
6. Replication Independence

71
Date’s 12 Rules for a DDBMS

7. Distributed Query Processing


8. Distributed Transaction Processing
9. Hardware Independence
10. Operating System Independence
11. Network Independence
12. Database Independence

• Last four rules are ideals.


72

You might also like