04_Distributed DBMSs - Concepts and Design
04_Distributed DBMSs - Concepts and Design
1
Chapter 24 - Objectives
• Concepts.
• Advantages and disadvantages of
distributed databases.
• Functions and architecture for a
DDBMS.
• Distributed database design.
• Levels of transparency.
• Comparison criteria for DDBMSs.
2
Concepts
Distributed Database
A logically interrelated collection of
shared data (and a description of this
data), physically distributed over a
computer network.
Distributed DBMS
Software system that permits the
management of the distributed database
and makes the distribution transparent to
users.
3
Concepts
5
Distributed Processing
6
Parallel DBMS
A DBMS running across multiple processors
and disks designed to execute operations in
parallel, whenever possible, to improve
performance.
• Based on premise that single processor
systems can no longer meet requirements
for cost-effective scalability, reliability, and
performance.
• Parallel DBMSs link multiple, smaller
machines to achieve same throughput as
single, larger machine, with greater
scalability and reliability.
7
Parallel DBMS
8
Parallel DBMS
(a) shared
memory
(c) shared
nothing
9
Advantages of DDBMSs
• Complexity
• Cost
• Security
• Integrity control more difficult
• Lack of standards
• Lack of experience
• Database design more complex
11
Types of DDBMS
• Homogeneous DDBMS
• Heterogeneous DDBMS
12
Homogeneous DDBMS
13
Heterogeneous DDBMS
15
Open Database Access and Interoperability
19
Functions of a DDBMS
20
Reference Architecture for DDBMS
• Due to diversity, no accepted
architecture equivalent to ANSI/SPARC
3-level architecture.
• A reference architecture consists of:
– Set of global external schemas.
– Global conceptual schema (GCS).
– Fragmentation schema and allocation schema.
– Set of schemas for each local DBMS conforming
to 3-level ANSI/SPARC.
• Some levels may be missing,
depending on levels of transparency
supported.
21
Reference Architecture for DDBMS
22
Reference Architecture for MDBS
24
Components of a DDBMS
25
Distributed Database Design
26
Distributed Database Design
Fragmentation
Relation may be divided into a number of
sub-relations, which are then distributed.
Allocation
Each fragment is stored at site with
“optimal” distribution.
Replication
Copy of fragment may be maintained at
several sites.
27
Fragmentation
30
Data Allocation
Centralized: Consists of single database and
DBMS stored at one site with users
distributed across the network.
Partitioned: Database partitioned into disjoint
fragments, each fragment assigned to one
site.
Complete Replication: Consists of
maintaining complete copy of database at
each site.
Selective Replication: Combination of
partitioning, replication, and centralization.
31
Comparison of Strategies for Data
Distribution
32
Why Fragment?
• Usage
– Applications work with views rather than
entire relations.
• Efficiency
– Data is stored close to where it is most
frequently used.
– Data that is not needed by local
applications is not stored.
33
Why Fragment?
• Parallelism
– With fragments as unit of distribution,
transaction can be divided into several
subqueries that operate on fragments.
• Security
– Data not required by local applications is
not stored and so not available to
unauthorized users.
34
Why Fragment?
• Disadvantages
– Performance,
– Integrity.
35
Correctness of Fragmentation
36
Correctness of Fragmentation
Completeness
If relation R is decomposed into fragments R1,
R2, ... Rn, each data item that can be found in R
must appear in at least one fragment.
Reconstruction
• Must be possible to define a relational
operation that will reconstruct R from the
fragments.
• Reconstruction for horizontal fragmentation
is Union operation and Join for vertical .
37
Correctness of Fragmentation
Disjointness
• If data item di appears in fragment Ri, then it
should not appear in any other fragment.
• Exception: vertical fragmentation, where
primary key attributes must be repeated to
allow reconstruction.
• For horizontal fragmentation, data item is a
tuple.
• For vertical fragmentation, data item is an
attribute.
38
Types of Fragmentation
40
Mixed Fragmentation
41
Horizontal Fragmentation
• For example:
P1 = type=‘House’(PropertyForRent)
P2 = type=‘Flat’(PropertyForRent)
42
Horizontal Fragmentation
S21 = branchNo=‘B003’(S2)
S22 = branchNo=‘B005’(S2)
S23 = branchNo=‘B007’(S2)
46
Derived Horizontal Fragmentation
Ri = R F Si, 1iw 47
Example - Derived Horizontal Fragmentation
S3 = branchNo=‘B003’(Staff)
S4 = branchNo=‘B005’(Staff)
S5 = branchNo=‘B007’(Staff)
48
Derived Horizontal Fragmentation
49
Distributed Database Design Methodology
• Distribution Transparency
– Fragmentation Transparency
– Location Transparency
– Replication Transparency
– Local Mapping Transparency
– Naming Transparency
51
Transparencies in a DDBMS
• Transaction Transparency
– Concurrency Transparency
– Failure Transparency
• Performance Transparency
– DBMS Transparency
• DBMS Transparency
52
Distribution Transparency
• Distribution transparency allows user to
perceive database as single, logical entity.
• If DDBMS exhibits distribution transparency,
user does not need to know:
– data is fragmented (fragmentation transparency),
– location of data items (location transparency),
– otherwise call this local mapping transparency.
• With replication transparency, user is
unaware of replication of fragments .
53
Naming Transparency
54
Naming Transparency
56
Transaction Transparency
58
Concurrency Transparency
59
Classification of Transactions
60
Classification of Transactions
61
Concurrency Transparency
62
Concurrency Transparency
64
Performance Transparency
65
Performance Transparency
SELECT p.propNo
FROM Property p INNER JOIN
(Client c INNER JOIN Viewing v ON c.clientNo =
v.clientNo)
ON p.propNo = v.propNo
WHERE p.city=‘Aberdeen’ AND c.maxPrice > 200000;
68
Performance Transparency - Example
Assume:
• Each tuple in each relation is 100
characters long.
• 10 renters with maximum price
greater than £200,000.
• 100 000 viewings for properties in
Aberdeen.
• Computation time negligible
compared to communication time. 69
Performance Transparency - Example
70
Date’s 12 Rules for a DDBMS
0. Fundamental Principle
To the user, a distributed system should look
exactly like a nondistributed system.
1. Local Autonomy
2. No Reliance on a Central Site
3. Continuous Operation
4. Location Independence
5. Fragmentation Independence
6. Replication Independence
71
Date’s 12 Rules for a DDBMS