Unit V
Unit V
Foundation of Distributed Concurrency Control
• In order to analyze the correctness of a distributedIn order to analyze the correctness of a distributed concurrency control method we need a formal model.
• Serializabilityy in a Centralized Database
A transaction accesses the database by issuing read & write primitives. Let Ri (X) and Wi(X) denote a read & write
operation issued by a transaction Ti for data item X. A schedule is a sequence of operations performed by transactions
transactions.
Ex :‐ S1 : Ri(X) Rj(X) Wi(Y) Rk(Y) Wi(X)
• Two transactions Ti and Tj execute serially in a schedule S if
• A schedule is serial if no transactions execute concurrently in it
concurrently in it.
Ex:‐ S2 : Ri(X) Wi(X) Ri(Y) Rj(X) Wj(Y) Rk(Y) Wi(X)
( )
Serial (S2) : Ti Tj Tk
We don’t want transactions to execute serially but theyy must execute concurrentlyy p provided their execution is correct (Serializable).
A schedule is Serializable if it is computationally A schedule is Serializable, if it is computationally equivalent to a serial schedule.
• In order to get serializability we need the
conditions to check whether two schedules are conditions to check whether two schedules are equivalent.
h f ll i h di i
• The following are the conditions to ensure schedules are equivalent.
– Each red operation reads the data item values which are produced by the same write
operations in both schedules.
• Serializability in a Distributed Database
In distributed database each transaction performs operation In distributed database, each transaction performs operation at several sites. can ensure all local schedules are serializable
can ensure all local schedules are serializable.
But this is not sufficient to for distributed transactions. Ex : S1( at Site 1) : Ri(X) Wi(X) Rj(X) Wj(X)
Ex :‐ S1( at Site 1) : Ri(X) Wi(X) Rj(X) Wj(X) S2( at Site 2) : Rj(X) Wj(X) Ri(X) Wi(X)
• In order to get serializability of distributed transactions a stronger condition is required transactions a stronger condition is required.
• The execution of Transaction T1, T2, … Tn is correct if
correct if
– Each local schedule Sk is serializable
– There exists a Total Ordering of T1, T2, … Tn such that if Ti < Tj in the total ordering, then there is a schedule Sk’ such that Sk is
equivalent to Sk’ and Ti < Tj in Serial (Sk’), for each site k where both transactions have
2‐Phase‐Locking as a Distributed Concurrency Control
• All distributed transaction are 2All distributed transaction are 2 Phase‐Phase locked then all local locked then all local schedule are serializable.
• If a distributed transaction is 2‐phase locked then its
subtransactions at different sites, taken separately are 2‐ phase locked.
• If transactions T1, T2, …. Tn are 2‐phase locked this situation cannot accur
this situation cannot accur.
• But since transactions are all 2‐phase locked, none of them release any locks before getting none of them release any locks before getting other locks.
• This will lead to deadlock situationThis will lead to deadlock situation.
• One of the transaction will be aborted by Deadlock resolution algorithm
Deadlock resolution algorithm.
• Two phase locking ensures that all executions are serializable but it does not allow all serializable
• Example :‐ consider transactions Ti & Tj for Fund transfer
The execution E may be like this The execution E may be like this
S1 : Ri(x) Wi(x) Rj(x) Wj(x)
S2 : Ri(y) Wi(y) Rj(y) Wj(y) S2 : Ri(y) Wi(y) Rj(y) Wj(y)
• The execution E is not allowed by 2‐phase locking because Ti will not release the lock on x till he obtain because Ti will not release the lock on x till he obtain lock on y.
• Also as per 2Also as per 2 phase‐phase commitment protocol both write commitment protocol both write locks on x & y are maintained till end of transaction.
Time & Timestamps in Distributed Databases
• In distributed system it is required to know if Event A at
• In distributed system, it is required to know if Event A at some site happened before event B at different site.
• Many of concurrency control & deadlock prevention
• Many of concurrency control & deadlock prevention algorithms need this kind of information.
• The determination of an ordering of events consists in
• The determination of an ordering of events consists in assigning to each event A which occurs in distributed systemy a timestampp TS(A)( ) havingg the followingg
properties.
– TS(A) uniquely identifies A.
• A occur before relationship can be defined as follows.
• The relation occurred before is denoted as Æ, can be to a distributed environment byy usingg following rules.
– If A & B are two events at the same site & A occurred before B then A Æ B.
– If the event A consists in sendingg a messageg and event B consists in receiving the same message then A Æ B.
– If A Æ B and B Æ C then A Æ C.
• We call two events A & B pseudop ‐simultaneous if neither A Æ B nor B Æ A (Example – See fig).
Site 1 Site 2
A Message M1 D
B Message M1 E
We consider now the generation of timestamps.
The first condition, uniqueness can be easily satisfied in distributed system It is sufficient that each site add to distributed system. It is sufficient that each site add to
However synchronization between counters at different site would be difficult.
To solve this, the counters of the two sites can be kept approximately aligned by simply including in each
message the value of counter of the sending site If a message the value of counter of the sending site. If a site receives a message with timestamp value TS which is greater than its current counter, it increments its
counter to be TS+1 counter to be TS+1.
Distributed
Deadlocks
• The detection and resolution of deadlocks is an important activity in DBMS.
• The deadlock detection in distributed databases consisting circular waiting involves several sites.
• We use Distributed wait‐for graph (DWFG) and Local wait‐for g p (graph (LWFG)) for detectingg
Site 1 Site 2
T1A1 T1A2
T2A1 T2A2
T2A1 T2A2
Distributed wait-for graph (DWFG) showing distributed deadlock
The notation TiAj refers to Agent Aj of Transaction Ti. The directed edge from an agent TiAj to an agent TrAs The directed edge from an agent TiAj to an agent TrAs means that TiAj is blocked and waiting for TrAs.
• There are two reasons for an agent waiting for h
another.
– Agent TiAj waits for the agent TrAj to release the resource which it needs. In this case Ti & Tr are different and agents are at same site. This is indicated by continuous edge.
– Agentg TiAjj waits for the agentg TiAs to performp some required function. In this case two
Site 1
T1A1 T1A2 Input Port
T2A1 T2A2
Output Port
Local wait-for graph (LWFG)
A local wait-for graph (LWFG) is the portion of the DWFG consisting only nodes & edges at single site, extended with an indication of the nodes which
t t t h i d ti
• A deadlock is local if it is caused by a cycle in an LWFG.
• A deadlock is distributed if it is caused byy a cycley in an DWFG which is not contained in any LWFG.
• Deadlock resolution involves the selection of one or more transaction to be aborted & restarted.
• The redundancy present in distributed databases increases the probability of deadlocks.
• There are following methods to deal with deadlocks in distributed databases
– Deadlock detection using centralized or hierarchical control
– Distributed deadlock detection
Deadlock Detection using Centralized or Hierarchical Controllers
Hierarchical Controllers
• In this method a selected site is chosen at which
li d d dl k d i
a centralized deadlock detector is run.
• It builds DWFG & checks for cycle.
• The deadlock detector receives information from all other sites.
• At each site there is a Local deadlock detector whose responsibility is to determine all potential whose responsibility is to determine all potential global deadlocks at its site.
Site 1
T1A1 T1A3
T2A1 T5A1
T3A1
T3A2 T4A1
Site 1
T1A1 T1A3
T4A1 T3A2
Potential global deadlock cycle at site 1 Potential global deadlock cycle at site 1
• The global deadlock detector collects the
messages related to potential global deadlock messages related to potential global deadlock cycle from each site & connects the partial
information to build a DWFG, determines the information to build a DWFG, determines the cycle & selects the transaction to be aborted.
• Centralized deadlock detection is simple but has
• Centralized deadlock detection is simple but has two main drawbacks.
Th it t hi h d t t f il
– The site at which detector runs may fail.
• It’s very common that deadlock involves very few sites which are close to each other.
• In this case, they can discover the deadlock without communicatingg with central site.
• We can use hierarchical controllers to reduce communication cost.
• In this method a tree of deadlock detector is built.
• The leaves of the tree consists of Local Deadlock Detector (LDD).( )
• The nonleaf nodes consists of Nonlocal Deadlock Detector (NLDD).( )
NLDD0
NLDD1 NLDD2
LDD2 LDD3 LDD5
LDD1 LDD2 LDD4 LDD5
Site2 Site3
Site5 Site4
Site1 Site2
Distributed Deadlock Detection
• In a Distributed Deadlock Detection, there is no distinction between local & nonlocal deadlock detectors.
• Each site has same responsibility. Sites exchange information about waiting transactions in order to determine global deadlock.
• Potential deadlock cycles are detected by each site.
• All the output & input ports are collected into a single node called External (EX)
single node, called External (EX).
T1 T2 Site 1
Sit 1 T1 T2 EX Site 1 T1 T2 EX
Site 2 T1 T2 EX
Messages
(EX, T2, T1) : From site 2 to site 1
• In centralized deadlock detection all potential deadlock cycles are sent to one designated site but in this case there is no such site.
• The idea used by distributed deadlock detection
algorithm consists in transmitting the potential deadlock algorithm consists in transmitting the potential deadlock information along with deadlock cycle itself.
• Ex:Ex: In‐ In previous figure, the local deadlock detector has previous figure, the local deadlock detector has detected potential deadlock cycle consisting of T1, T2, EX.
• The site 1 can choose to send this cycle to
– The site where there is an agent of T1 waiting for T1 at site 1 (backward along the cycle)
at site 1 (backward along the cycle).
– The site where there is an agent of T2 for which T2 at site 1 is waiting. (forward along the cycle).
• It is not necessary to transmit in both directions. Only forward direction is sufficient.
• But if all sites transmit their potential deadlock cycles forward alongg the cycley then more information is
transmitted than required.
• This may lead to discover same deadlock twice.
• To avoid this algorithm uses following rule.
The ppotential deadlock cycley is transmitted onlyy if the transaction ID of the transaction for which EX waits is
greater than the transaction ID of the transaction waiting for EX.
Ex:‐ In previous example Site 2 transmit potential deadlock l
cycle
• This algorithm works by successive iteration.
• At each iteration the local deadlock detectors at eachAt each iteration, the local deadlock detectors at each site perform the following actions.
1. build LWFG using local information (include EX). 1. build LWFG using local information (include EX).
2. For each message which has been received, perform followingg modification of the LWFG:
¾For each transaction in the message, add it to the LWFG if it does not already exists.
¾For each transaction in the message starting with EX, create an edge to the next transaction in the message.
3 Find the cycle not involving EX in the LWFG Each cycle 3. Find the cycle not involving EX in the LWFG. Each cycle
indicates existence of deadlock.
4 Find cycles involving EX These cycles are potential 4. Find cycles involving EX. These cycles are potential
Site 1 T1 T2 EX
Site 2 T1 T2 EX
Messages
(EX, T2, T1) : From site 2 to site 1
( )
Distributed Deadlock Prevention
• With this method a transaction is aborted &
• With this method, a transaction is aborted & restarted if there is risk that deadlock might occur
occur.
• If the transaction T1 requests a resource which is held b T2 then a “pre enti e test” is applied if held by T2, then a “preventive test” is applied; if test indicates risk of deadlock.
– Then T1 is not allowed to enter wait state. It is aborted & restarted.
(Nonpreemptive Method)
• Nonpreemptive Method :‐ is based on timestamps as follows
If Ti requests for a lock on data item which is already
locked byy j Tj, then Ti is permittedp to wait onlyy if Ti is older than Tj. If Ti is younger than Tj, then Ti is aborted &
restarted.
• Preemptive Method :‐ is opposite to previous.
If Ti requests for a lock on data item which is already locked by Tj, then Ti is permitted to wait only if Ti is
younger than Tj otherwise Tj is aborted & lock is granted to Ti
to Ti.
Concurrency
control
based
on
Timestamps
• This concurrency control mechanism allows aThis concurrency control mechanism allows a transaction to read or write a data item x only if x had been last written by an older
if x had been last written by an older
transaction ; otherwise it rejects the operation and restarts the transaction
The Basic Timestamp Mechanism
The Basic Timestamp Mechanism
• The basic timestampp mechanism appliespp the following rules.
1. Each transaction receives a timestamp where it is initiated at its site of origin
is initiated at its site of origin
2. Each read or write operation which is required by a transaction has the timestamp of the
by a transaction has the timestamp of the
4. Let TS be the time stamp of a read operation on data item x If TS <WTM(x) the read operation data item x . If TS <WTM(x),the read operation is rejected and the issuing transaction is
restarted with a new timestamp ; otherwise the restarted with a new timestamp ; otherwise the
• Rule 4 and 5 ensure that conflicting operation are executed in timestamp order at all sites ; are executed in timestamp order at all sites ; hence the timestamp order is a total order
satisfyingy g the condition of propositionp p 8.1 and
the executions produced by this mechanisms are correct
EXAMPLE:‐
consider the concurrent execution E of examplep 8.1, which is repeated here for convenience:
S1:Ri(x)Wi(x)Rj(x)Wj(x)( ) ( ) j( ) j( )
S2: Ri(y)Wi(y)Rj(y)Wj(y)
• In the above example the two operations are one below the other this means that the two
below the other ,this means that the two operations start at the same time.
hi i b d d b 2 h
• This execution cannot be produced by 2‐phase‐ locking mechanism.
• With the basic time stamp mechanism this
execution is accepted if TS(Ti)<TS(Tj), because at the site 1 after the execution of Wi(x),
• Similar considerations apply also to site 2
• This appears to be an advantage of the
• This appears to be an advantage of the timestamp mechanism.
( )
• However, if Rj(x) were processed at site 1 before Wi(x) then Wi(x) would be rejected by rule 5 and
ld b b d d d
Ti would be aborted and restarted.
• The same would happen if Rj(y) were processed at site 2 before Wi(y).
• An interesting feature of the timestamp
mechanism is that it is deadlock free because mechanism is that it is deadlock free, because transactions are never blocked.
If i i i i
• If a transaction cannot execute an operation it is restarted.
• The basic rules which have been described above are sufficient to ensure the serializability of
are sufficient to ensure the serializability of transactions;
H h d b i d i h 2
• However, they need to be integrated with 2‐ phase commitment to ensure atomicity.
• With the timestamp mechanism we need a
different solution :instead of exclusive locks we use PREWRITES .
• Prewrites are issued by transaction instead of write operations ,they are buffered and not applied directly to the database.
• Only when the transaction is committed, are the corresponding write operations applied to the database.
• Integration of the basic timestamp method and
h
2‐phase‐commitment.
Thee abo e u es above rules 4 a d 5 a e subst tuted by t eand 5 are substituted by the following rules 4,5,and 6.
4. Let TS be the timestampp of a prewritep operationp Pi on data item x.
If TS <RTM(x) or TS<WTM(x), then operation is rejected and the issuing transaction is restarted; otherwise ,the prewrite Pi and its timestamp TS are buffered
are buffered.
5. Let TS be the timestamp of a read operation Pi on data item x
Pi on data item x.
¾ If TS<WTM(x), the operation is rejected. ¾ If TS>WTM(x) then Ri is executed only if ¾ If TS>WTM(x), then Ri is executed only if
there is no prewrite operation P(x) pending on data item x having a timestamp TS(P)<TS on data item x having a timestamp TS(P)<TS. ¾If there is one or more prewrite operation
6. Let TS be the timestamp of a write operation Wi on data item x
on data item x.
This operation is never rejected: however, it is
ibl b ff d if h i i i
possibly buffered if there is a prewrite operation with a timestamp TS(P)<TS, for the same reason
hi h h b d f b ff i d
which has been stated for buffering read operations.
Wi will be executed and eliminated from the buffer when all prewrites with smaller
timestamps have been eliminated from the buffer.
The “ignore obsolete write ” Rule 5 of the basic timestamp mechanism can be modified in the timestamp mechanism can be modified in the following way:
If h i f i i Wi( ) i
• If the timestamp of a write operation Wi(x) is smaller than the write timestamp of a write
i WTM( ) f h d i i i
operation WTM(x) of the data item x, it is possible to ignore the operation, instead of
j i h i d i h
The Conservative Timestamp Method
• The main disadvantage of the basic timestamp method is the great number of restarts which it causes.
• Conservative timestamping is a method which
eliminates restarts by buffering younger operations until all order conflicting operations have been
d h i j d d
executed. so that operations are never rejected and transactions are never restarted.
• In order to execute a buffered operation it is
necessary to know when there are no more older
fli ti ti
conflicting operations.
• The conservative timestamp method, is based on the following requirements and rules.
1. Each transaction is executed at one site only and does not activate remote programs. It can only issue read or write requests to remote sites.
2. A site i must receive all the read requestsq from a different site j in timestamp order. Similarly,a site i must receive all the write requests from a different site j in timestamp order.
If TS(Ti)<TS(Tj), it is sufficient to wait to send the Rj network Because of the requirement 2 site i knows that network. Because of the requirement 2, site i knows that there are no older requests which can arrive from any
site. The concurrency controller at site i behaves in site. The concurrency controller at site i behaves in
b: Fir a write operation W that arrives at site I :
If there is some read operation R buffered at site i such that TS(W)> TS(R)or there is some write operation W’ b ff d h h ( ) ( ’) h
buffered at site I such that TS(W)>TS(W’), then W is
buffered until these operations are executed ,else W is executed
• Conservative timestamping suffers from the following problems:
following problems:
• if one site never sends an operation to some other site then the assumption stated at the other site, then the assumption stated at the beginning of the point 3 does not hold.
h bl b l d b h
• This problem can be eliminated by requiring that each site periodically send timestamped “null”
h
operations to each site.
• Caution must be taken in the implementation of conservative timestamping in order to avoid
deadlocks.
Consider an example of possible deadlock situation like Ti executed at site 1: Ri(x) execute Wi(y)
Ti,executed at site 1: Ri(x), execute Wi(y) Tj,executed at site 2: Rj(y), execute Wj(x)
(a) Operations requested by transactions Ti and Tj (a) Operations requested by transactions Ti and Tj
site1(stores( item x)) site 2(stores( item y)y) Ri(x) buffered, waiting Rj(y) buffered, waiting for a write from site 2 for a write from site 1
(b) A possible deadlock situation
• This shows the situation exists after both read operations have been issued: both sites have buffered these reads, because they are waiting for a write operation from the other site
• Let no null operation is sent when there are still transaction pending, because it is expected that these transactions will issue some useful operation.
• Both transactions are blocked in this case and will never
performing deadlock detection by timeout.
• The above example shows the fact that Ti waits for a write from Tj and Tj waits for a write from Ti is not due to an
from Tj and Tj waits for a write from Ti is not due to an explicit rule of the mechanism but seems more to be an undesired side effect.
OPTIMISTIC METHODS FOR DISTRIBUTED
CONCURRENCY CONTROL
• The basic idea of optimistic method is the following: instead of suspending or rejecting following: instead of suspending or rejecting conflicting operations, like 2‐phase locking and timestamping, always execute a transaction to timestamping, always execute a transaction to completion.
• At the end of the transaction if the validation test
• The validation test verifies if the execution of the transaction is serializable.
transaction is serializable.
• In order to perform the test, some information about the execution of the transaction must be about the execution of the transaction must be retained until the validation is performed.
Th ti i ti h i b d th
• The optimistic approach is based on the
assumption that conflicts are rare and therefore
t t ti ill th t t
most transactions will pass the test.
• By processing operations without concurrency control overhead, a transaction is not delayed during its execution.
• Each transaction is considered to consist of three phases:
1. The Read Phase :‐ During the phase a transaction reads data items from data base , performs
computations , and determines new value for the data item of its write –set ; however , these values are not written in the data base. Note that the read phase contains almost the whole execution of the
i
transaction.
2. The validation phase :‐ During this phase a test is
performed to see whether the application of updates to the database which have been computed by the
t ti ld l f i t t
3. The Write Phase :‐ During this phase the updates are applied to the database if the validation phase are applied to the database if the validation phase
used for performing validation phase. • We have two algorithms.
Validation Using Timestamp on Data Items and
– Validation Using Timestamp on Data Items and Transactions
Validation Using Only Transaction Timestamp
– Validation Using Only Transaction Timestamp
Validation Using Timestamp on Data Items and
Transactions
Transactions
• This algorithm assumes a fully redundant database, where a copy of each data item is stored at each site where a copy of each data item is stored at each site.
• During the execution of the transaction (read phase) the updates are written into an update list
the updates are written into an update list.
• The validation phase consists of checking that the updates can be applied at all sites
updates can be applied at all sites.
• Each transaction receives a unique timestamp when it starts execution and each copy of each data item in starts execution, and each copy of each data item in the database carries the timestamp of the last
• A restrictive assumption of this algorithm is that read set contains the write set
that read‐set contains the write‐set.
• At the end of read phase, the transaction
d d li i i f ll i
produces an update list containing following elements.
1. The data items of the read‐set with their timestamps.
2. The new values of the data items of its write‐set. 3. The timestamp of transaction itself.
• During the validation phase, the update list is sent to every site
sent to every site.
• Each site votes on update list and
i i h i f i i
communicates its vote to the site of origin.
• If the site of origin receives a majority vote yes votes, it decides to commit the transaction
and communicates this decision to all other sites.
Validation
using
only
Transaction Timestamp
Transaction
Timestamp
Each transaction receives a timestamp during its execution in centralized database.
Let TNC(Transaction Number Counter) denote global f i i i
counter for assigning timestamps
The validation condition for a transaction Tj with timestamp TS(Tj) requires that for all transactions Ti timestamp TS(Tj) requires that for all transactions Ti
Condition 2: The Write-set if Ti does not intersect the read set of Tj and Ti completes its Write phase
read set of Tj , and Ti completes its Write phase before Tj starts its Write phase
Condition 3 : The write set if Ti does not intersect the
To transform above rules to algorithm following information is required
information is required
1. The read‐ and Write‐sets of T3
2 The Value of TNC when Tj started ; this is called 2. The Value of TNC when Tj started ; this is called
START(Tj)
3 The value if TNC when Tj finished its Tj finished its 3. The value if TNC when Tj finished its Tj finished its
read phase : this is called FINISH(Tj)
( ) d ( ) h f l l START(Tj) and FINISH(Tj) are therefore two local
variables of Tj containing timestamp values that are needed for performing validation and discarded after needed for performing validation , and discarded after Tj is terminated , TS(Tj) is assigned to Tj only after
write phasep if validation succeeds
TNC is incremented only when the definitive timestamp i i d th t b t t ti ill i is assigned , so that subsequent transactions will receive a greater timestamp
meaning of START(Tj) FINISH(Tj) TS(Tj) is shown in meaning of START(Tj), FINISH(Tj), TS(Tj) is shown in below fig (a).
Read Phase Validation Phase Write Phase
START(Tj) FINISH(Tj) TS(Tj)
Deriving an algorithm for validation of transaction Tj b d b f ll i diti
can be done by following conditions
1. For all Transactions Ti such that TS(Ti)<START(Tj) , nothing has to be checked Therefore the validation nothing has to be checked , Therefore the validation algorithm has to consider only Ti such that
TS(Ti)>START(Tj)( ) ( j)
Ti Tj
TS(T1) < START(T1) (B) C diti (1) (B) Condition(1)
2. The transaction Ti which have terminated their write phase during the read phase of Tj are identified by the validation algorithm by checking whether START(Tj) < FINISH(Tj)
Ti
FINISH(Tj).
For all these transactions condition(2) is checked
Ti
Tjj
Write-Set of Ti ∩ Read-Set of Tj= Φ START(T1) < TS(T1)<FINISH(T1)
3. The transaction Ti such that
FINISH(Ti)( ) < FINISH(Tj)( j) are identified byy keepingp g track of all transactions which have not yet terminated
execution ,and are called active transactions. For all these transactions, condition 3 is checked
FINISH(Ti) < FINISH(Tj)
W it S t f Ti ∩ (W it S t f Tj U R d S t Ti) Φ Write-Set of Ti ∩ (Write-Set of Tj U Read-Set Ti) = Φ
(C) Condition(3)