0% found this document useful (0 votes)
420 views

Lecture 3 - Concurrency Control and Fault Tolerance

This document discusses concurrency control and transaction management techniques for distributed systems. It covers transactions, schedules, serializability, and various concurrency control protocols including lock-based, graph-based, and timestamp-based approaches. Lock-based concurrency control uses locks on data items to prevent conflicting access and ensure serializability. The document discusses binary locks that can be in one of two states - locked or unlocked - to control concurrent access to data. Maintaining serializability is the ultimate goal of concurrency control techniques.

Uploaded by

Sibgha Israr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
420 views

Lecture 3 - Concurrency Control and Fault Tolerance

This document discusses concurrency control and transaction management techniques for distributed systems. It covers transactions, schedules, serializability, and various concurrency control protocols including lock-based, graph-based, and timestamp-based approaches. Lock-based concurrency control uses locks on data items to prevent conflicting access and ensure serializability. The document discusses binary locks that can be in one of two states - locked or unlocked - to control concurrent access to data. Maintaining serializability is the ultimate goal of concurrency control techniques.

Uploaded by

Sibgha Israr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 54

CS-423 Parallel and Distributed Computing

Lecture 3
Concurrency Control and Fault Tolerance
Agenda
• Transactions
• Transaction Properties
• The Transaction Model
• Types of Transactions
• Schedule
• Concurrency Problems
• Serializability
• Lock based Concurrency Control
• Two Phase Locking Protocol
• Graph based Concurrency Control
• Timestamp based Concurrency Control
• Recovery with Concurrent Transactions
Transactions
• Sequence of separate requests to a server to be atomic in the
sense that

They are free from Interference by operations of other clients


All-or-Nothing Property

• Example
Transaction T:
a.Withdraw(100)
b.deposit(100)
Transaction Properties
The Transaction Model
• Examples of primitives for transactions.

Primitive Description

BEGIN_TRANSACTION Make the start of a transaction

END_TRANSACTION Terminate the transaction and try to commit

ABORT_TRANSACTION Kill the transaction and restore the old values

READ Read data from a file, a table, or otherwise

WRITE Write data to a file, a table, or otherwise


The Transaction Model
BEGIN_TRANSACTION BEGIN_TRANSACTION
reserve Lhr -> Kar; reserve Lhr -> Kar;
reserve Kar -> Isb; reserve Kar -> Isb;
reserve Isb -> Muree; reserve Isb -> Muree full =>
END_TRANSACTION ABORT_TRANSACTION
(a) (b)

a) Transaction to reserve three flights commits


b) Transaction aborts when third flight is unavailable
Abstracting Transactions
• The only operations that matter in a transaction are the data
items the transaction reads or writes.

• Unless we see a commit or abort, we assume that T1 or T2 are


not yet finished.
Types of Transactions
• Flat Transaction
• Type of simple transaction that satisfy
ACID properties.
• Nested Transaction (a)
• Transaction constructed from number
of sub-transactions operates on
multiple data on multiple machines.
• Distributed Transaction (b)
• Number of flat sub-transactions that
operates on data that are distributed
across multiple machines

9
Transaction in Distributed System
• A transaction is an execution of a sequence of client
requests transforming the server data from one
consistent state to another.

• A distributed transaction is a set of operations on data that is


performed across two or more data repositories (especially
databases). It is typically coordinated across separate nodes
connected by a network, But may also span multiple databases on a
single server.
Schedule
• A schedule is a series of operations from one or more transactions.

• The sequence of operations that take place is called a schedule.


• This schedule shows a partial execution of T1 and T2.
Schedule
• A schedule can be of two types: Serial Schedule, Concurrent Schedule
Serial Schedule: 
• When one transaction completely executes before starting another
transaction, the schedule is called a serial schedule.
• A serial schedule is always consistent. e.g.; If a schedule S has debit
transaction T1 and credit transaction T2, possible serial schedules are T1
followed by T2 (T1T2) or T2 followed by T1 (T2T1). A serial schedule
has low throughput and less resource utilization.
• Any serial schedule is an acceptable schedule.
• Any schedule that is guaranteed to produce the same result as a serial
schedule is an acceptable schedule.
Schedule
Concurrent Schedule: 
• When operations of a transaction are interleaved with operations of other
transactions of a schedule, the schedule is called a Concurrent schedule.
• e.g.; the Schedule of debit and credit transactions shown in Table 1 is
concurrent (in next slide).

• But concurrency can lead to inconsistency in the database. 


Concurrent Schedule

S: R1(A), R2(A), OP1(A), OP2(A), W1(A), W2(A)


Advantages of Concurrency
• In general, concurrency means, that more than one transaction can work on a
system.
• The advantages of a concurrent system are:
• Waiting Time: It means if a process is in a ready state but still the process does
not get the system to get execute is called waiting time. So, concurrency leads
to less waiting time.
• Response Time: The time wasted in getting the response from the CPU for the
first time, is called response time. So, concurrency leads to less Response Time.
• Resource Utilization: The amount of Resource utilization in a particular system
is called Resource Utilization. Multiple transactions can run parallel in a system.
So, concurrency leads to more Resource Utilization.
• Efficiency: The amount of output produced in comparison to given input is called
efficiency. So, Concurrency leads to more Efficiency.
Concurrency
Problems

1
Concurrency
Problems

2
Concurrency
Problems

3
Concurrency
Problems

4
Concurrency
Problems

5
Serializability
Schedule Equivalence
• Two schedules are guaranteed to have same results if they will read the same
values and write the values in the same order.

• S1 and S2 are equivalent: T2 reads X before T1 writes it.


• S1 and S3 are not necessarily the same. In S3, T2 reads X after T1 writes, so the
value read may be different.
Serializability
Conflict Ordering
• Two operations (schedules) are said to be conflicting if all conditions
satisfy: 
• They belong to different transactions
• They operate on the same data item
• At Least one of them is a write operation
• Conflicting operations pair (R1(A), W2(A)) because they belong to two different
transactions on same data item A and one of them is write operation.
• Similarly, (W1(A), W2(A)) and (W1(A), R2(A)) pairs are also conflicting.
• On the other hand, (R1(A), W2(B)) pair is non-conflicting because they operate on
different data item.
• Similarly, ((W1(A), W2(B)) pair is non-conflicting.
Serializability
Conflict Equivalent
• If two schedules S1 and S2 have the same ordering for all conflicting
operations, then they will have the same result.
• Such schedules are called conflict equivalent.

• Conflict equivalent schedules will always have the same result.


Serializability
• A Schedule is called view serializable if it is view equal to a serial
schedule (no overlapping transactions). 

• If a schedule S1 is (conflict) equivalent to a serial schedule S,


then we say that S1 is serializable.

• By definition the results are serializable if there exists a serial order of


the execution of the transactions that yields the same result as the
actual executions.
Serializability
BEGIN_TRANSACTION BEGIN_TRANSACTION BEGIN_TRANSACTION
x = 0; x = 0; x = 0;
x = x + 1; x = x + 2; x = x + 3;
END_TRANSACTION END_TRANSACTION END_TRANSACTION

(a) (b) (c)

Schedule 1 x = 0; x = x + 1; x = 0; x = x + 2; x = 0; x = x + 3 Legal
Schedule 2 x = 0; x = 0; x = x + 1; x = x + 2; x = 0; x = x + 3; Legal
Schedule 3 x = 0; x = 0; x = x + 1; x = 0; x = x + 2; x = x + 3; Illegal

(d)

• a) – c) Three transactions T1, T2, and T3


• d) Possible schedules
26
ULTIMATE GOAL
• To achieve Serializability
• To ensure Concurrency Control
Transaction Management - Concurrency
• Transaction management is about making sure that when database
operations change data, they do not cause problems.
• Problems may occur because:
• Concurrency: multiple transactions may need to read or write the same data
as well as transactions aborting due to different reasons
• Unforeseen problems: Software or hardware crashes, especially problematic
is loss of state: loss of all data currently in memory

• Concurrency is defined as the ability for multiple tasks to access


shared data simultaneously.
Concurrency Control
• Lock based Concurrency Control
• Graph based Concurrency Control
• Timestamp based Concurrency Control
Lock based Concurrency Control
• A Lock is a variable assigned to any data item in order to keep track of the
status of that data item so that isolation and non-interference is ensured
during concurrent transactions.

• Lock exists to prevent two or more users from performing any change on the
same data item at the very same time. Therefore, it is correct to interpret this
technique as a means of synchronizing access.
• In layman’s terms, this may be further simplified to the metaphorical ‘lock’
that is put on a data item so that no other user may unlock the ability to
perform any update.
Lock based Concurrency Control

• If the user/session on the right attempts an update, it will be met with


a LOCK WAIT state or otherwise be STALLED until access to the data
item is unlocked. In some situations – if the stall exceeds a time limit – the
session is terminated and an error statement is returned.
Lock based Concurrency Control
Binary Lock
• A lock is fundamentally a variable which
holds a value.
• A binary lock is a variable capable of
holding only 2 possible values, i.e., a 1
(depicting a locked state) or a 0
(depicting an unlocked state).
• This lock is usually associated with
every data item in the database ( maybe
at table level, row level or even the entire
database level). 
Lock based Concurrency Control
Binary Lock
+ Effectively mutually exclusive and
establish isolation perfectly.
+ Demand less from the system since the
system must only keep a record of the
locked items. The system is the lock
manager subsystem which is a feature of
all DBMSs today.
- Binary locks are highly restrictive.
- They do not even permit reading of the
contents of item X. As a result, they are not
used commercially.
- Do not guarantee Serializability.
Lock based Concurrency Control
Shared or Exclusive Locks

• Locks which permit other transactions to make read queries since a READ
query is non-conflicting
• However, if a transaction demands a write query on item X, then that
transaction must be given exclusive access to item X.
Lock based Concurrency Control
Shared or Exclusive Locks
• Multi-mode lock  Read/Write Locks
• SHARED-LOCKED –
Data can only be read when a shared lock
is applied. Data cannot be written.
• EXCLUSIVE-LOCKED –
Data can be read as well as written when
an exclusive lock is applied. 
• UNLOCKED –
Once a transaction has completed its
read or update operations, no lock is held
and the data item is unlocked. In this
state, the item may be accessed by any
queued transactions.
Lock based Concurrency Control
Shared or Exclusive Locks
• The most popular way of implementing these locks is by introducing a LOCK-
TABLE which keeps track of the number of read-locks on the data items and the
transactions with write-locks on different items.

• Write-locked  logically supposed to have no reads on account of the fact that it


is now exclusive.
• Read-locked  shared by multiple transactions
Lock based Concurrency Control
Shared or Exclusive Locks
• The point here is that an item may be “unlocked only if:
• the ‘write’ operation terminates or is completed
• all ‘read’ operations terminate or are completed

• Here are a few rules that Shared/Exclusive Locks must obey:


1. A transaction T MUST issue the unlock(X) operation after all read and write
operations have finished.
2. A transaction T may NOT issue a read_lock(X) or write_lock(X) operation on an
item which already has a read or write-lock issued to itself.
3. A transaction T is NOT allowed to issue the unlock(X) operation unless it has been
issued with a read_lock(X) or write_lock(X) operation.
Lock based Concurrency Control
Shared or Exclusive Locks
- Do not guarantee serializability of schedules on their own. A separate
protocol must be followed to ensure this.
- Commercially not optimized for speedy transactions; not the best solution
due to lock contention issues.
- Performance overhead is not negligible.
Two Phase Locking Protocol (2PL)
1. Growing Phase: New locks on data items may be acquired but none can be released.
Only when the data changes are committed the transaction starts the Shrinking phase.
2. Shrinking Phase: Existing locks may be released but no new locks can be acquired.
Cascading Rollback in 2PL
• Cascading rollback means if one transaction rollback then other transactions dependent
on it should also roll back.
• If there are 3 transactions T1, T2, and T3. If T1 Rollback, T2 and T3 also have to rollback,
causing cascading Rollback. So cascading Rollback is possible in a two-phase locking
protocol.
Deadlock in 2PL

• Step 3  T1 cannot execute the exclusive lock in R2 as the Exclusive lock is already
obtained by transaction T2 in step 1.
• Also, in step 3, the Exclusive lock on R1 cannot be executed by T2 as T1 has already
applied the Exclusive lock on R1.
• So, if we draw a wait-for graph, we can see that T1 waits for T2 and T2 waits for T1,
which creates conflict, and the waiting time never ends, which leads to a deadlock.
Categories of 2PL
Strict two-phase locking protocol 
• The transaction can release the shared lock after the lock point.
• The transaction can not release any exclusive lock until the transaction commits.
• In strict two-phase locking protocol, if one transaction rollback then the other
transaction should also have to roll back. The transactions are dependent on each other.
This is called Cascading schedule.

• Still Deadlocks are possible


Categories of 2PL
Rigorous two-phase locking protocol 
• The transaction cannot release either of the locks, i.e., neither shared lock nor exclusive
lock.
• All types of locks (shared and exclusive) are released until after the transaction
commits.
• Serailizability is guaranteed in a Rigorous two-phase locking protocol.
• Deadlock is not guaranteed in rigorous two-phase locking protocol.

• Makes implementation easier.


Categories of 2PL
Conservative two-phase locking protocol 
• The transaction must lock all the data items it requires in the transaction before the
transaction begins.
• If any of the data items are not available for locking before execution of the lock, then
no data items are locked.
• The read-and-write data items need to know before the transaction begins. This is not
possible normally.
• Conservative two-phase locking protocol is deadlock-free.
• Conservative two-phase locking protocol does not ensure a strict schedule.
Categories of 2PL
Graph based Concurrency Control
• Lock Based Protocol  avoid Deadlocks and ensure a Strict Schedule.
• Strict Schedules are possible by following Strict or Rigorous 2-PL.
• Deadlocks can be avoided if we follow Conservative 2-PL but it cannot be used
practically.
• Graph Based Protocols are used as an alternative to 2-PL. 
• Tree Based Protocols is a simple implementation of Graph Based Protocol. 
Graph based Concurrency Control
Tree Based Protocol

• Partial Order on Database items determines a tree


like structure.
• Only Exclusive Locks are allowed.
• The first lock by Ti may be on any data item.
Subsequently, a data Q can be locked by Ti only
if the parent of Q is currently locked by Ti.
• Data items can be unlocked at any time.
Graph based Concurrency Control
Tree Based Protocol
+ Ensures Conflict Serializability and Deadlock Free schedule.
+ We need not wait for unlocking a Data item as we did in 2-PL protocol, thus
increasing the concurrency. 

- Unnecessary locking overheads may happen sometimes.


- Cascading Rollbacks is still a problem. We don’t follow a rule of when
Unlock operation may occur so this problem persists for this protocol.   
Timestamp based Concurrency Control
• It uses the System Time or Logical Counter as a timestamp to serialize the
execution of concurrent transactions.
• The Timestamp-based protocol ensures that every conflicting read and write
operations are executed in a timestamp order.
• The older transaction is always given priority in this method. It uses system
time to determine the time stamp of the transaction. 
• Timestamp-based protocols manage conflicts as soon as an operation is
created.
+ Schedules are serializable just like 2PL protocols
+ No waiting for the transaction, which eliminates the possibility of deadlocks!
- Starvation is possible if the same transaction is restarted and continually
aborted
Timestamp based Concurrency Control
• Keeps track of which data items have been read and written
• At the point of committing it checks all other transactions to see if any of its
items have been changed since it has started
• If so Abort
• If not Committed
Concurrency Control & Recovery

• Concurrency Control
• Provide correct and highly available access to data in the presence of
concurrent access by large and diverse user populations
• Recovery
• Ensures database is fault tolerant, and not corrupted by software, system or
media failure
• 7x24 access to mission critical data
• Existence of CC&R allows applications to be written without explicit
concern for concurrency and fault tolerance

51
Recovery with Concurrent Transactions
Interaction with Concurrency Control
• The recovery scheme depends greatly on the concurrency control scheme
that is used. So, to rollback a failed transaction, we must undo the updates
performed by the transaction.

Transaction Rollback
• In this scheme, a failed transaction is rollbacked by using the log.
• The system scans the log backward a failed transaction, for every log record
found in the log the system restores the data item.
Recovery with Concurrent Transactions
Checkpoints
• Checkpoints is a process of saving a snapshot of the applications state so
that it can restart from that point in case of failure.
• Checkpoint is a point of time at which a record is written onto the database
form the buffers.
• Checkpoint shortens the recovery process.
• When it reaches the checkpoint, then the transaction will be updated into the
database, and till that point, the entire log file will be removed from the file.
Then the log file is updated with the new step of transaction till the next
checkpoint and so on.
• The checkpoint is used to declare the point before which the DBMS was in
the consistent state, and all the transactions were committed.
Recovery with Concurrent Transactions
Restart Recovery
• When the system recovers from a crash, it constructs two lists.
• The undo-list consists of transactions to be undone,
• The redo-list consists of transaction to be redone.
• Initially, they are both empty.
• The system scans the log backward, examining each record, until it finds the
first <checkpoint> record.

You might also like