Unit VI
Unit VI
Reliability
Reliability
Reliability is defined as “a measure of the
Reliability is defined as a measure of the
success with which the system conforms to some
authoritative specification of its behavior…p
When the behavior deviates from that which is
Basic
Concept
• The reliability can be divided into two parts
• The reliability can be divided into two parts.
– Application Dependent.
Application Independent
– Application Independent.
• The Application Independent specification of reliability consists in requiring that transaction maintain atomicity consists in requiring that transaction maintain atomicity, durability, serializability & isolation properties.
• Application dependent part consists of requiring thatApplication dependent part consists of requiring that transaction fulfill the general system’s specifications.
• We emphasize two aspects of reliability :We emphasize two aspects of reliability :
– Correctness.
• Example :‐ Consider the DD consisting of two sites 1 & site 1 (the coordinator) to site 2
site 1 (the coordinator) to site 2.
There are two possible strategies to handle the problem.
¾First considers the correctness requirement by keeping X2 locked until failure is repaired.
¾Second maximizes the availability at the risk of
Following are the problems when we try to design a reliable distributed database system.
• Commitment of transaction :‐ If we use 2‐Phase commitment protocol, we lose availability.
We can use different protocols which allow a transaction to terminate properly even in presence of failures. These called Termination Protocols.
• Multiple copies of data & robustness of concurrency control :‐
• Determining the state of the network :‐
• Detection & resolution of inconsistencies :‐
• Checkpoints & Cold restart :‐
Nonblocking
Commitment
Protocols
•
A commitment protocol is called blocking if
A
commitment
protocol
is
called
blocking
if
occurrence
of
some
kinds
of
failures
forces
some
of
the
participating
p
p
g
sites
to
wait
until
failure
is
repaired.
•
A
transaction
which
can
not
be
terminated
at
a
site
is
called
pending
at
this
site.
•
The
2
‐
Phase
commitment
p
protocol
is
blocking
g
if
coordinator
fails
&
some
participant
has
at
the
same
time
declared
itself
ready
to
commit.
I
- / PM
I PM / RM
ua / PM / AAM
U
tm / ACM R
A C
AAM / ACM RM / CM
C A
CM / - ACM /
-A C
Coordinator
C A
Participant
Notes Messages
= Transitions which are due to an exchange of messages = Transitions which are due to an exchange of messages.
= Unilateral Transitions (Unilateral abort or timeout)( )
α / ββ = α is the incoming message or local condition, g g ,
•
If
a
state
diagram
of
this
kind
is
used
for
analyzing
reliability aspects of a protocol care must be
reliability
aspects
of
a
protocol,
care
must
be
taken
in
assuming
that
transitions
from
one
state
to another are atomic
to
another
are
atomic.
•
For
example,
consider
a
transition
from
state
X
to
Y
i h i
I &
O
state
Y
with
input
I
&
output
O.
•
The
following
behavior
is
assumed.
1. The
input
message
I
is
received.
Nonblocking
Commitment
Protocols
with Site Failures
with
Site
Failures
•
We
are
interested
in
designing
a
termination
protocol
for
the
2
‐
Phase
Commitment
protocol
which
allows
the
transaction
to
be
terminated
at
all
operational
sites,
when
a
failure
of
the
coordinator
site
occurs.
•
This
is
possible
only
in
theses
two
cases
1. At least one of the participant has received the
1. At
least
one
of
the
participant
has
received
the
command.
The
3
‐
phase
commitment
protocol
I
State diagram for the 3-Phase-Commitment Protocol
New States PC = Prepared-to-Commitp
•
This new protocol eliminates the blocking problem of
the 2-phase-commitment protocol because
p
p
1. If one of the operational participants has received the
command and the command was ABORT then the
operational participants can abort the transaction
2. If one of the operational participants has received the
d
d h
d
ENTER
commands and the command was
ENTER-PREPARED-STATE, then all the operational
participants can commit the transaction
participants can commit the transaction
3. If none of the operational participants has received
the ENTER-PREPARED-STATE command , we have
the ENTER PREPARED STATE command , we have
the case which can not be terminated for a 2PC
Termination
protocols
for
3
‐
phase
‐
commitment
• The design of termination protocols is based on the following • The design of termination protocols is based on the followingproperty.
• If at least one operational participant has not entered If at least one operational participant has not entered the Prepared‐to‐Commit state, then the transaction can be safely aborted.
• If at least one operational participant has entered the
Prepared‐to‐Commit state, then the transaction can be safely committed.
• Since the above conditions are not mutually exclusive, in several cases the termination protocol can decide whether to commit or cases the termination protocol can decide whether to commit or abort.
• The simplest termination protocol is the centralized , nonprogressive protocol.
• First the coordinator is elected by the operational participant. • The new coordinator behaves as follows.
1 If the new coordinator is in the Prepared to Commit State it issues 1. If the new coordinator is in the Prepared‐to‐Commit State, it issues
to all operational participants the command to enter also in the state ; when it has received all the OK messages , it issues the COMMIT command
2. If the new coordinator is in the commit state , i.e. , it has
committed the transaction it issues the COMMIT command to all committed the transaction , it issues the COMMIT command to all the participants
3. If the new coordinator is in the abort state, it issues the ABORT command to all the participants
4. Otherwise , new coordinator orders all participants to go back to a state previous to the Prepared to Commit and after it has
state previous to the Prepared‐to‐Commit , and after it has
• This protocol is similar to 3‐Phase‐Commitment protocol.
• In case of failure of new coordinator, the same
termination protocol can be reentered by the remaining
operational sites b electing a ne coordinator
operational sites by electing a new coordinator.
• Disadvantage :‐ It is nonprogressive.
Th l i hi h di b
• There are several ways in which a new coordinator can be selected.
O f th t l t di t i t i
Restart
Protocols
for
3
‐
Phase
‐
Commitment
• A restart protocol is e ec ted b a site hen it reco er
• A restart protocol is executed by a site when it recover from a failure.
• In the case of 2 Phase Commitment the restart
• In the case of 2‐Phase‐Commitment, the restart protocol requires accessing remote recovery
information if the participant failed while it was in information, if the participant failed while it was in ready state.
• With 3With 3 Phase Commitment‐Phase‐Commitment & termination protocol the & termination protocol, the restart procedure will have to access remote recovery information if pparticipantp has completedp the first phase,p , independently of whether it has reached the prepared‐ to‐commit state or not, because at restart it is not
Existence of nonblocking protocols for partitions
Commitment
Protocols
&
Network
Partitions
Existence of nonblocking protocols for partitions
•
The
problem
of
the
existence
of
nonblocking
protocol
in case of partition can be addressed by considering a
in
case
of
partition
can
be
addressed
by
considering
a
different
problem
:
the
existence
of
protocols
which
allow
independent
p
recovery
y
in
case
of
site
failures.
•
Suppose
that
we
can
build
the
protocol
such
that
if
one site, say site2, fails, then
one
site,
say
site2,
fails,
then
1. The
other
site,
site1,
terminates
the
transaction
2 Site2 at restart terminates the transaction correctly
2. Site2
at
restart
terminates
the
transaction
correctly
without
requiring
any
additional
information
from
site1
•
The
modified
protocol
is
based
on
the
following
assumptions:
‐
assumptions:
1. A
site
discovers
that
another
site
is
down
by
not
receiving a required message within a given
receiving
a
required
message
within
a
given
timeout
2 A
b l
l b
f
i
2. A
message
can
be
lost
only
because
of
a
site
failure
3. Each
site
receives
a
message
,
changes
,
and
sends
the
required
answer
as
an
atomic
Protocol
which
can
deals
with
partitions
Primary
approach:
•
If
the
2PC
protocol
is
used
together
with
a
primary
site
approach
,
then
it
is
possible
to
terminate
all
the
transactions
of
the
group
of
the
primary
site
,
if
and
only
if
the
coordinators
of
all
pending
transactions
belong
to
this
group
Majority approach and quorum‐based protocols
The basic rules of a quorum based protocols are The basic rules of a quorum based protocols are:
1. Each site i has associated with a number of votes Vi , Vi being a positive integer
being a positive integer.
2. Let V indicate the sum of the votes of all sites of the
• A centralized termination protocol for the quorum‐ based 3PC has the following structure:
1. A new coordinator is elected
2. The coordinator collects state information and acts 2. The coordinator collects state information and acts
according to the following rules :
a. If at least one site has committed (aborted) , send a a. If at least one site has committed (aborted) , send a
COMMIT(ABORT) command to the other sites
b. If the number of votes of sites which have reached the
b t e u be o otes o s tes c a e eac ed t e
prepared‐to‐commit state is greater than equal to Vc , send a COMMIT command.
c. If the number of votes of sites in the prepare to abort state reaches about quorum , send an ABORT
d. If
the
number
of
votes
of
sites
which
have
reached
the prepare to commit state plus number of votes
the
prepare
to
commit
state
plus
number
of
votes
of
uncertain
sites
is
greater
than
or
equal
to
Vc
,
send a PREPARE
‐
TO
‐
COMMIT command to
send
a
PREPARE TO COMMIT
command
to
uncertain
sites
and
wait
for
condition
2b
occur
e If the number of votes which have reached the
e. If
the
number
of
votes
which
have
reached
the
prepare
‐
to
‐
abort
state
plus
number
of
votes
of
uncertain sites is greater than or equal to Va, send
uncertain
sites
is
greater
than
or
equal
to
Va,
send
a
PREPARE
‐
TO
‐
ABORT
command
and
wait
for
condition
2c
occur
Reliability
&
Concurrency
Control
•
Suppose
that
there
is
a
failure.
•
How
can
we
maximize
the
number
of
transactions
which
are
executed
during
this
failure
by
operational
part
of
the
system?
Nonredundant
Databases
• If the database is nonredundant,, then it is veryy simplep to
determine which transactions can be executed.
Redundant
Databases
•
There are two reasons to have redundancy
There
are
two
reasons
to
have
redundancy
– To increase locality of reads.
To increase availability & reliability of system
– To increase availability & reliability of system.
•
We
have
seen
three
main
approaches
to
t l b
d
2 PL
concurrency
control
based
on
2
‐
PL
– Write‐locks‐all
– Majority locking
– Primary copy locking.
Example :‐ Consider a distributed databases consists of
Group 1 Group 2 Group 3
A) 1 2, 3 ‐‐‐
B) 2 1, 3 ‐‐‐
C) 3 1, 2 ‐‐‐
)
D) 1 2 3
•
Write
Write locks all.
‐
locks
‐
all
•
Weighted
majority
locking.
Determining
a
Consistent
View
of
the
Network
• There are two aspects for this
• There are two aspects for this.
– Monitoring the state of the network.
Propagating a new state information to all sites
– Propagating a new state information to all sites consistently.
• We can use timeouts in the algorithm to discover if site is
• We can use timeouts in the algorithm to discover if site is down.
• But use of timeout may lead to inconsistent view of theBut use of timeout may lead to inconsistent view of the network.
• We assume that a generalized networkwide mechanism is built such that all higher‐level programs are provided with the following facilities.
1. There is at each site a state table containing an entry for each site. The entry can be up or down.
2. Any program can set a “watch” on any site, so that it receives an interrupt when a site changes state.
• A site considers up only those sites with which it can
i h f ll h d i hi h b l
communicate, therefore all crashed sites which belong to a different group in case of partitions are considered
down down.
• We will consider separately the problem of monitoring & propagating state information
Monitoring
the
State
of
the
Network
• Generally basic mechanism for deciding whether a site is
• Generally basic mechanism for deciding whether a site is up or down is to request a message from it & wait for a timeout.
• Let us call requesting site the controller & other site the
controlled site.
• In a monitoring algorithm, instead of having controller
request messages from the controlled site, it is more easy to have controlled site send I‐AM‐UP message periodically to the controller.
• Using this mechanism for detecting whether a site is up or down, the problem consists of assigning controllers to
• A possible solution is to assign circular ordering to the sites and to assign to each site the function of controller of its predecessor.
• In absence of failures, each site periodically sends I‐AM‐ UP message to its successor & controls that I‐AM‐UP message from its predecessor arrives in time.
• If I‐AM‐UP message from the predecessor does not arrive in time, then the controller assumes that the controlled site has failed updates the state table & controlled site has failed, updates the state table & broadcasts the updated state table to all other sites.
• If the predecessor of the site is down then the site has
. . . . . . . .
K-3 K-2 K-1 K (Sites)
UP DOWN DOWN UP (States)
UP DOWN DOWN UP (States)
Broadcasting
a
New
State
E
h ti
th
it
f
ti
d t
t
t t
•
Each
time
the
monitor
function
detects
a
state
change,
it
broadcasts
the
new
state
table
so
that
ll it
f th
h
t t t bl
all
sites
of
the
same
group
have
same
state
table.
•
Since
this
function
could
be
activated
by
several
sites
in
parallel,
some
mechanism
in
needed
to
control
interference.
•
A
possible
mechanism
is
to
attach
a
globally
Detection
&
Resolution
of
Inconsistency
• When a partition of the network occurs, transactionsWhen a partition of the network occurs, transactions
should be run at most in one group of sites if we want to
preserve consistency of the database.
• But in some applications transactions are allowed to run in all partitions where there is at least one copy of the necessary data to get more availability.
• When a failure is repaired, one can try to eliminate
i i
inconsistency.
• To do this it is necessary first to discover which portions
f th d t b i i t (D t ti f
of the data become inconsistency (Detection of
inconsistency) & then to assign these portions a value
which is most reasonable (Resolution of inconsistency)
Detection
of
Inconsistency
• Let us assume that during a partition, transactions haveLet us assume that during a partition, transactions have been executed in two or more groups of sites &
independent updates may have been performed on different copies of the same fragment.
• The general approach consisting of comparing the contents of copies to check that they are identical or not is inefficient & incorrect.
• A correct approach is the detection of inconsistencies can be based on version numbers.
• During normal operation all copies are master copies & mutually consistent.
• For each copy an Original version number & Current version number are maintained.
• Initiall Original ersion n mber is set to 0 & c rrent
• Initially Original version number is set to 0 & current version number is set to 1.
• Each time an updatep is performedp on the copypy onlyy current version number is incremented.
• When a partition occurs, the original version number of each isolated copy is set to the value of its current
each isolated copy is set to the value of its current version number.
• The originalg version number records the current version number of the isolated copies before any “partitioned updates” are performed on it.
• The original version number is not altered until the
•
Example
:
‐
Let
us
consider
copies
x1,
x2
&
x3
of
data item x are stored at three different sites
data
item
x
are
stored
at
three
different
sites.
•
Let
V1,
V2
&
V3
are
version
number.
I iti ll
ll
i
i t
tl
d t d
•
Initially
all
copies
are
consistently
updated.
•
Assume
that
one
update
is
performed,
so
V1
=
(0,2)
V2
=
(0,2) V3
=
(0,2)
•
Now
a
partition
occurs
separating
x3
from
the
other
two
copies.
•
Let
x1
&
x2
as
master
copies.
p
• Suppose that only master copies are updated V1 = (0 3) V2 = (0 3) V3 = (2 2)
V1 = (0,3) V2 = (0,3) V3 = (2,2)
• After repair it is possible to see that x3 has not been modified,, since its current & originalg version number are same.
• In this case, no inconsistency occurred & it is sufficient to perform the updates on x3.
• Now suppose that only x3 is updated during partition V1 = (0,2) V2 = (0,2) V3 = (2,3)
• Since original version number of x3 is not equal to x1 &
2 th t i h t b d t d
x2, the master copies have not been updated.
• If there are no other copies then we can apply to the master copies the updates of x3
Checkpoints
&
Cold
Restart
• Cold restart is required after some catastrophic failure q p which has caused the loss of log information on stable storage.
• In DDB cold restart is difficult because if one site has toIn DDB, cold restart is difficult because if one site has to establish an earlier state, then all other sites also have to establish earlier state.
Th i l b l ff ti ll it f th
• The recovery process is global, affecting all sites of the database.
• A consistent global restart C is characterized by the g y following properties.
– For each transaction T, C contain the updates
performed by all subtransactions of T at any site or it performed by all subtransactions of T at any site or it does not contain any of them.
– If a transaction T is contained in C, then all conflicting
i hi h h d d i h i li i