5/8/2009
1
Introduction
Chapter 1
Definition of a Distributed System (1)
A distributed system is: A distributed system is:
A collection of independent computers that appears to its
users as a single coherent t
5/8/2009
2
Definition of a Distributed System (2)
1.1
A distributed system organized as middleware.
Note that the middleware layer extends over multiple machines.
Goal of Distributed System
Tujuan Sistem Terdistribusi
1. Connecting users ke resources.(Membuat mudah bagi user untuk mengakses remote resource2 dan men-share remote resource2 ke user2 lain dengan cara dikendalikan)
2. Opennes (Sistem memberikan Service sesuai dengan standar rule (aturan) yng digambarkan dengan standar rule (aturan) yng digambarkan syintax dan semantic dari service2 tersebut)
3. Scalability.
5/8/2009
3
Transparency in a Distributed System
Transparency Description
Access Hide differences in data representation and how a resource is accessed
Location Hide where a resource is located Location Hide where a resource is located
Migration Hide that a resource may move to another location Relocation Hide that a resource may be moved to another
location while in use
Replication Hide that a resource may be shared by several competitive users
Concurrency Hide that a resource may be shared by several competitive users
Different forms of transparency in a distributed system.
Failure Hide the failure and recovery of a resource
Persistence Hide whether a (software) resource is in memory or on disk
Scalability Problems
Concept Example
Centralized services A single server for all users Centralized data A single on-line telephone book
Centralized algorithms Doing routing based on complete information
5/8/2009
4
Scaling Techniques (1)
1.4
The difference between letting:
a) a server or
b) a client check forms as they are being filled
Scaling Techniques (2)
1.5
5/8/2009
5
Hardware Concepts
1.6
Different basic organizations and memories in distributed computer systems
Multiprocessors (1)
1.7
5/8/2009
6
Multiprocessors (2)
1.8
a) A crossbar switch
b) An omega switching network
Homogeneous Multicomputer Systems
1-9
a) Grid
5/8/2009
7
Software Concepts
System Description Main Goal
DOS
Tightly-coupled operating system for multi-processors and homogeneous
Hide and manage hardware DOS processors and homogeneous
multicomputers
hardware resources NOS
Loosely-coupled operating system for heterogeneous multicomputers (LAN and WAN)
Offer local services to remote clients
Middleware Additional layer atop of NOS implementing general-purpose services
Provide distribution transparency
An overview of
• DOS (Distributed Operating Systems)
• NOS (Network Operating Systems)
• Middleware
Uniprocessor Operating Systems
1.11
5/8/2009
8
Multiprocessor Operating Systems (1)
monitor Counter { private:
int count = 0; public:
int value() { return count;} void incr () { count = count + 1;} void decr() { count = count – 1;} }
A monitor to protect an integer against concurrent access.
}
Multiprocessor Operating Systems (2)
monitor Counter { private:
i t t 0
void decr() { if ( t 0) { int count = 0;
int blocked_procs = 0; condition unblocked; public:
int value () { return count;} void incr () {
if (blocked_procs = = 0) count = count + 1;
if (count = = 0) {
blocked_procs = blocked_procs + 1; wait (unblocked);
blocked_procs = blocked_procs – 1; }
else
count = count – 1; }
A monitor to protect an integer against concurrent access, but blocking a process.
count = count + 1; else
signal (unblocked); }
5/8/2009
9
Multicomputer Operating Systems (1)
1.14
General structure of a multicomputer operating system
Multicomputer Operating Systems (2)
1.15
5/8/2009
10
Multicomputer Operating Systems (3)
Synchronization point Send buffer Reliable comm. Synchronization point Send buffer
guaranteed?
Block sender until buffer not full Yes Not necessary Block sender until message sent No Not necessary Block sender until message received No Necessary Block sender until message delivered No Necessary
Relation between blocking, buffering, and reliable communications.
Distributed Shared Memory Systems (1)
a) Pages of address space distributed p among four machines
b) Situation after CPU 1 references page 10
5/8/2009
11
Distributed Shared Memory Systems (2)
1.18
False sharing of a page between two independent processes.
Network Operating System (1)
1-19
5/8/2009
12
Network Operating System (2)
1-20
Two clients and a server in a network operating system.
Network Operating System (3)
1.21
5/8/2009
13
Positioning Middleware
1-22
General structure of a distributed system as middleware.
Middleware and Openness
1.23
5/8/2009
14
Comparison between Systems
I tem Distributed OS Netw ork OS Middlew are-based OS Multiproc. Multicomp.
Degree of transparency Very High High Low High Degree of transparency Very High High Low High Same OS on all nodes Yes Yes No No Number of copies of OS 1 N N N Basis for communication Shared
memory Messages Files Model specific Resource management Global,
central
Global,
distributed Per node Per node
A comparison between multiprocessor operating systems, multicomputer operating systems, network operating systems, and middleware based distributed systems.
Scalability No Moderately Yes Varies Openness Closed Closed Open Open
Clients and Servers
1 25 1.25
5/8/2009
15
An Example Client and Server (1)
The header.hfile used by the client and server.
An Example Client and Server (2)
5/8/2009
16
An Example Client and Server (3)
1-27 b
A client using the server to copy a file.
Processing Level
1-28
5/8/2009
17
Multitiered Architectures (1)
1-29
Alternative client-server organizations (a) – (e).
Multitiered Architectures (2)
1-30
5/8/2009
18
Modern Architectures
1-31
Communication
Layered Protocols (1)
Layered Protocols (2)
Data Link Layer
Client-Server TCP
a) Normal operation of TCP.
Middleware Protocols
Conventional Procedure Call
a) Parameter passing in a local procedure call: the stack before the call to read
Client and Server Stubs
Steps of a Remote Procedure Call
1.
Client procedure calls client stub in normal way
2.
Client stub builds message, calls local OS
3.
Client's OS sends message to remote OS
4.
Remote OS gives message to server stub
5.
Server stub unpacks parameters, calls server
6.
Server does work, returns result to the stub
7.
Server stub packs it in message, calls local OS
8.
Server's OS sends message to client's OS
9.
Client's OS gives message to client stub
Passing Value Parameters (1)
Passing Value Parameters (2)
a) Original message on the Pentium
b) The message after receipt on the SPARC
Parameter Specification and Stub Generation
a)
A procedure
Doors
Asynchronous RPC (1)
a) The interconnection between client and server in a traditional RPC
Asynchronous RPC (2)
Writing a Client and a Server
Binding a Client to a Server
Distributed Objects
Binding a Client to an Object
a) (a) Example with implicit binding using only global references
b) (b) Example with explicit binding using global and local references
Distr_object* obj_ref; //Declare a systemwide object reference
obj_ref = …; // Initialize the reference to a distributed object
obj_ref-> do_something(); // Implicitly bind and invoke a method
(a)
Distr_object objPref; //Declare a systemwide object reference
Local_object* obj_ptr; //Declare a pointer to local objects
obj_ref = …; //Initialize the reference to a distributed object
obj_ptr = bind(obj_ref); //Explicitly bind and obtain a pointer to the local proxy
obj_ptr -> do_something(); //Invoke a method on the local proxy
Parameter Passing
The DCE Distributed-Object Model
a) Distributed dynamic objects in DCE.
Persistence and Synchronicity in Communication (1)
General organization of a communication system in which hosts are connected through a network
Persistence and Synchronicity in Communication (2)
Persistence and Synchronicity in Communication (3)
a) Persistent asynchronous communication
Persistence and Synchronicity in Communication (4)
c) Transient asynchronous communication
Persistence and Synchronicity in Communication (5)
e) Delivery-based transient synchronous communication at message delivery
Berkeley Sockets (1)
Socket primitives for TCP/IP.
Primitive Meaning
Socket Create a new communication endpoint
Bind Attach a local address to a socket
Listen Announce willingness to accept connections
Accept Block caller until a connection request arrives
Connect Actively attempt to establish a connection
Send Send some data over the connection
Receive Receive some data over the connection
Berkeley Sockets (2)
The Message-Passing Interface (MPI)
Some of the most intuitive message-passing primitives of MPI.
Primitive Meaning
MPI_bsend Append outgoing message to a local send buffer
MPI_send Send a message and wait until copied to local or remote buffer
MPI_ssend Send a message and wait until receipt starts
MPI_sendrecv Send a message and wait for reply
MPI_isend Pass reference to outgoing message, and continue
MPI_issend Pass reference to outgoing message, and wait until receipt starts
MPI_recv Receive a message; block if there are none
Message-Queuing Model (1)
Message-Queuing Model (2)
Basic interface to a queue in a message-queuing system.
Primitive Meaning
Put Append a message to a specified queue
Get Block until the specified queue is nonempty, and remove the first message
Poll Check a specified queue for messages, and remove the first. Never block.
Notify Install a handler to be called when a message is put into the specified
General Architecture of a Message-Queuing System (1)
General Architecture of a Message-Queuing System (2)
Message Brokers
The general organization of a message broker in a message-queuing system.
Example: IBM MQSeries
Channels
Some attributes associated with message channel agents.
Attribute Description
Transport type Determines the transport protocol to be used
FIFO delivery Indicates that messages are to be delivered in the order they are sent
Message length Maximum length of a single message
Setup retry
count Specifies maximum number of retries to start up the remote MCA
Message Transfer (1)
Message Transfer (2)
Primitives available in an IBM MQSeries MQI
Primitive Description
MQopen Open a (possibly remote) queue
MQclose Close a queue
MQput Put a message into an opened queue
Data Stream (1)
Data Stream (2)
Data Stream (3)
Specifying QoS (1)
A flow specification.
Characteristics of the Input Service Required
•maximum data unit size (bytes)
•Token bucket rate (bytes/sec)
•Toke bucket size (bytes)
•Maximum transmission rate (bytes/sec)
•Loss sensitivity (bytes)
•Loss interval ( sec)
•Burst loss sensitivity (data units)
•Minimum delay noticed ( sec)
•Maximum delay variation ( sec)
Specifying QoS (2)
Setting Up a Stream
Synchronization Mechanisms (1)
Synchronization Mechanisms (2)
Processes
Thread Usage in Nondistributed Systems
Thread Implementation
Multithreaded Servers (1)
Multithreaded Servers (2)
Three ways to construct a server.
Model Characteristics
Threads Parallelism, blocking system calls
Single-threaded process No parallelism, blocking system calls
The X-Window System
Client-Side Software for Distribution Transparency
Servers: General Design Issues
a) Client-to-server binding using a daemon as in DCE
Object Adapter (1)
Organization of an
object server
Object Adapter (2)
The header.h file used by the adapter and any program that calls an adapter
.
/* Definitions needed by caller of adapter and adapter */ #define TRUE
#define MAX_DATA 65536
/* Definition of general message format */ struct message {
long source /* senders identity */
long object_id; /* identifier for the requested object */ long method_id; /* identifier for the requested method */ unsigned size; /* total bytes in list of parameters */ char **data; /* parameters as sequence of bytes */ };
/* General definition of operation to be called at skeleton of object */ typedef void (*METHOD_CALL)(unsigned, char* unsigned*, char**);
Object Adapter (3)
The thread.h file used by the adapter for using threads.
typedef struct thread THREAD; /* hidden definition of a thread */
thread *CREATE_THREAD (void (*body)(long tid), long thread_id);
/* Create a thread by giving a pointer to a function that defines the actual */ /* behavior of the thread, along with a thread identifier */
void get_msg (unsigned *size, char **data);
void put_msg(THREAD *receiver, unsigned size, char **data);
Object Adapter (4)
The main part of an
adapter that implements a thread-per-object
Reasons for Migrating Code
Models for Code Migration
Migration and Local Resources 1
Three Types of Process-to-resource binding1. Binding by Identifier (Strongest Binding): Proses memerlukan dengan tepat me-reference resource.
Contoh : proses menggunakan URL untuk merujuk ke web site atau FTP Server pada alamat internet.
2. Binding by Value (Weaker Binding) : Hanya value dari resource yang dibutuhkan. Eksekusi proses tidak berpengaruh jika resource lain
menyediakan value yang sama.
Contoh : Ketika program membutuhkan library standar seperti
pemrograman pada C atau Java. Library tsb biasanya secara lokal ada, biarpun lokasinya pada sistem file lokal berbeda antara site.
3. Binding by Type (Weakest Binding) : Proses membutuhkan hanya tipe tertentu dari resource.
Migration and Local Resources 2
Three Types of Resource-to machine binding
1. Unattached resources : mudah dipindahkan (move) antar mesin (data atau file yang berasosiasi dengan program).
2. Fastened resource : move (memindahkan) atau copy dapat dilakukan Contoh : database lokal dan web site kesluruhan (web site complete)
3. Fixed resource : bound spesifik mesin atau linkungan dan tidak bisa dipindahkan (move). Fixed resource sering device lokal
Migration and Local Resources 3
Actions to be taken with respect to the references to local resources when migrating code to another machine.
Unattached Fastened Fixed
By identifier By value By type
MV (or GR) CP ( or MV, GR)
RB (or GR, CP)
GR (or MV) GR (or CP) RB (or GR, CP)
GR GR RB (or GR)
Resource-to machine binding
Process-to-resource
binding
GR : Establish a global Systemwide Reference MV : Move The Resource
CV : Copy The Value of the Resource
Migration in Heterogeneous Systems
The principle of maintaining a migration stack to support migration of an execution segment in a heterogeneous environment
Overview of Code Migration in D'Agents (1)
A simple example of a Tel agent in D'Agents submitting a script to a remote machine (adapted from [gray.r95])
proc factorial n {
if ($n 1) { return 1; } # fac(1) = 1
expr $n * [ factorial [expr $n – 1] ] # fac(n) = n * fac(n – 1)
}
set number … # tells which factorial to compute set machine … # identify the target machine
agent_submit $machine –procs factorial –vars number –script {factorial $number }
Overview of Code Migration in D'Agents (2)
An example of a Tel agent in D'Agents migrating to different machines where it executes the UNIX who command (adapted from [gray.r95])
all_users $machines
proc all_users machines {
set list "" # Create an initially empty list
foreach m $machines { # Consider all hosts in the set of given machines agent_jump $m # Jump to each host
set users [exec who] # Execute the who command append list $users # Append the results to the list }
return $list # Return the complete list when done }
set machines … # Initialize the set of machines to jump to
set this_machine # Set to the host that starts the agent
# Create a migrating agent by submitting the script to this machine, from where # it will jump to all the others in $machines.
agent_submit $this_machine –procs all_users -vars machines
-script { all_users $machines }
Implementation Issues (1)
Implementation Issues (2)
The parts comprising the state of an agent in D'Agents.
Status Description
Global interpreter variables Variables needed by the interpreter of an agent
Global system variables Return codes, error codes, error strings, etc.
Global program variables User-defined global variables in a program
Procedure definitions Definitions of scripts to be executed by an agent
Stack of commands Stack of commands currently being executed
Software Agents in Distributed Systems
Some important properties by which different types of agents can be distinguished.
Property Common to
all agents? Description
Autonomous Yes Can act on its own
Reactive Yes Responds timely to changes in its environment
Proactive Yes Initiates actions that affects its environment
Communicative Yes Can exchange information with users and other agents
Continuous No Has a relatively long lifespan
Mobile No Can migrate from one site to another
Agent Technology
Agent Communication Languages (1)
Examples of different message types in the FIPA ACL [fipa98-acl], giving the purpose of a message, along with the description of the actual message content.
Message purpose Description Message Content
INFORM Inform that a given proposition is true Proposition
QUERY-IF Query whether a given proposition is true Proposition
QUERY-REF Query for a give object Expression
CFP Ask for a proposal Proposal specifics
PROPOSE Provide a proposal Proposal
ACCEPT-PROPOSAL Tell that a given proposal is accepted Proposal ID
REJECT-PROPOSAL Tell that a given proposal is rejected Proposal ID
REQUEST Request that an action be performed Action specification
Agent Communication Languages (2)
A simple example of a FIPA ACL message sent between two agents using Prolog to express genealogy information.
Naming
Name Spaces (1)
Name Spaces (2)
The general organization of the UNIX file system
Linking and Mounting (1)
Linking and Mounting (2)
Linking and Mounting (3)
Name Space Distribution (1)
Name Space Distribution (2)
A comparison between name servers for implementing nodes from a large-scale name space partitioned into a global layer, as an
administrational layer, and a managerial layer.
Item Global Administrational Managerial
Geographical scale of network Worldwide Organization Department
Total number of nodes Few Many Vast numbers
Responsiveness to lookups Seconds Milliseconds Immediate
Update propagation Lazy Immediate Immediate
Number of replicas Many None or few None
Implementation of Name Resolution (1)
Implementation of Name Resolution (2)
Implementation of Name Resolution (3)
Recursive name resolution of <nl, vu, cs, ftp>. Name servers cache intermediate results for subsequent lookups.
Server for node
Should
resolve Looks up
Passes to child Receives and caches Returns to requester
cs <ftp> #<ftp> -- -- #<ftp>
vu <cs,ftp> #<cs> <ftp> #<ftp> #<cs> #<cs, ftp>
ni <vu,cs,ftp> #<vu> <cs,ftp> #<cs> #<cs,ftp>
#<vu> #<vu,cs> #<vu,cs,ftp>
Implementation of Name Resolution (4)
The DNS Name Space
The most important types of resource records forming the
contents of nodes in the DNS name space.
Type of record
Associated
entity Description
SOA Zone Holds information on the represented zone
A Host Contains an IP address of the host this node represents
MX Domain Refers to a mail server to handle mail addressed to this node
SRV Domain Refers to a server handling a specific service
NS Zone Refers to a name server that implements the represented zone
CNAME Node Symbolic link with the primary name of the represented node
PTR Host Contains the canonical name of a host
HINFO Host Holds information on the host this node represents
DNS Implementation (1)
An excerpt
from the
DNS
database
for the
zone
DNS Implementation (2)
Part of the description for the
vu.nl
domain
which contains the
cs.vu.nl
domain.
Name Record type Record value
cs.vu.nl NIS solo.cs.vu.nl
The X.500 Name Space (1)
A simple example of a X.500 directory entry using X.500 naming conventions.
Attribute Abbr. Value
Country C NL
Locality L Amsterdam
Organization L Vrije Universiteit
OrganizationalUnit OU Math. & Comp. Sc.
CommonName CN Main server
Mail_Servers -- 130.37.24.6, 192.31.231,192.31.231.66
FTP_Server -- 130.37.21.11
The X.500 Name Space (2)
The X.500 Name Space (3)
Two directory entries having
Host_Name
as RDN.
Attribute Value Attribute Value
Country NL Country NL
Locality Amsterdam Locality Amsterdam
Organization Vrije Universiteit Organization Vrije Universiteit
OrganizationalUnit Math. & Comp. Sc. OrganizationalUnit Math. & Comp. Sc.
CommonName Main server CommonName Main server
Host_Name star Host_Name zephyr
Naming versus Locating Entities
a) Direct, single level mapping between names and addresses.
Forwarding Pointers (1)
Forwarding Pointers (2)
Home-Based Approaches
Hierarchical Approaches (1)
Hierarchical Approaches (2)
Hierarchical Approaches (3)
Hierarchical Approaches (4)
a) An insert request is forwarded to the first node that knows about entity E.
Pointer Caches (1)
Pointer Caches (2)
Scalability Issues
The scalability issues related to uniformly placing subnodes of a
The Problem of Unreferenced Objects
Reference Counting (1)
Reference Counting (2)
a)
Copying a reference to another process
and incrementing the counter too late
Advanced Referencing Counting (1)
a) The initial assignment of weights in weighted reference counting
Advanced Referencing Counting (2)
Advanced Referencing Counting (3)
Advanced Referencing Counting (4)
Tracing in Groups (1)
Tracing in Groups (2)
Tracing in Groups (3)
Synchronization
Clock Synchronization
Physical Clocks (1)
Physical Clocks (2)
TAI seconds are of constant length, unlike solar seconds. Leap seconds are introduced when
Clock Synchronization Algorithms
Cristian's Algorithm
The Berkeley Algorithm
a) The time daemon asks all the other machines for their clock values
b) The machines answer
Lamport Timestamps
a) Three processes, each with its own clock. The clocks run at different rates.
Example: Totally-Ordered Multicasting
Global State (1)
a)
A consistent cut
Global State (2)
Global State (3)
b) Process Q receives a marker for the first time and records its local state
c) Q records all incoming message
The Bully Algorithm (1)
The bully election algorithm
• Process 4 holds an election
• Process 5 and 6 respond, telling 4 to stop
Global State (3)
d) Process 6 tells 5 to stop
A Ring Algorithm
Mutual Exclusion:
A Centralized Algorithm
a) Process 1 asks the coordinator for permission to enter a critical region. Permission is granted
b) Process 2 then asks permission to enter the same critical region. The coordinator does not reply.
A Distributed Algorithm
a) Two processes want to enter the same critical region at the same moment.
b) Process 0 has the lowest timestamp, so it wins.
A Toke Ring Algorithm
a)
An unordered group of processes on a network.
Comparison
A comparison of three mutual exclusion algorithms.
Algorithm Messages per entry/exit
Delay before entry
(in message times) Problems
Centralized 3 2 Coordinator crash
Distributed 2 ( n – 1 ) 2 ( n – 1 ) Crash of any process
The Transaction Model (1)
The Transaction Model (2)
Examples of primitives for transactions.
Primitive Description
BEGIN_TRANSACTION Make the start of a transaction
END_TRANSACTION Terminate the transaction and try to commit
ABORT_TRANSACTION Kill the transaction and restore the old values
READ Read data from a file, a table, or otherwise
The Transaction Model (3)
a) Transaction to reserve three flights commits
b) Transaction aborts when third flight is unavailable
BEGIN_TRANSACTION reserve WP -> JFK; reserve JFK -> Nairobi; reserve Nairobi -> Malindi; END_TRANSACTION
(a)
BEGIN_TRANSACTION reserve WP -> JFK; reserve JFK -> Nairobi;
reserve Nairobi -> Malindi full => ABORT_TRANSACTION
Distributed Transactions
a) A nested transaction
Private Workspace
a) The file index and disk blocks for a three-block file
b) The situation after a transaction has modified block 0 and appended block 3
Writeahead Log
a) A transaction
b) – d) The log before each statement is executed
x = 0; y = 0;
BEGIN_TRANSACTION; x = x + 1;
y = y + 2 x = y * y;
END_TRANSACTION; (a)
Log
[x = 0 / 1]
(b)
Log
[x = 0 / 1] [y = 0/2]
(c)
Log
[x = 0 / 1] [y = 0/2] [x = 1/4]
Concurrency Control (1)
Concurrency Control (2)
Serializability
a)
–
c)
Three transactions T
1, T
2, and T
3d)
Possible schedules
BEGIN_TRANSACTIONx = 0; x = x + 1;
END_TRANSACTION
(a)
BEGIN_TRANSACTION x = 0;
x = x + 2;
END_TRANSACTION
(b)
BEGIN_TRANSACTION x = 0;
x = x + 3;
END_TRANSACTION
(c)
Schedule 1 x = 0; x = x + 1; x = 0; x = x + 2; x = 0; x = x + 3 Legal
Schedule 2 x = 0; x = 0; x = x + 1; x = x + 2; x = 0; x = x + 3; Legal
Schedule 3 x = 0; x = 0; x = x + 1; x = 0; x = x + 2; x = x + 3; Illegal
Two-Phase Locking (1)
Two-Phase Locking (2)
Synchronization
Clock Synchronization
Physical Clocks (1)
Physical Clocks (2)
TAI seconds are of constant length, unlike solar seconds. Leap seconds are introduced when
Clock Synchronization Algorithms
Cristian's Algorithm
The Berkeley Algorithm
a) The time daemon asks all the other machines for their clock values
b) The machines answer
Lamport Timestamps
a) Three processes, each with its own clock. The clocks run at different rates.
Example: Totally-Ordered Multicasting
Global State (1)
a)
A consistent cut
Global State (2)
Global State (3)
b) Process Q receives a marker for the first time and records its local state
c) Q records all incoming message
The Bully Algorithm (1)
The bully election algorithm
• Process 4 holds an election
• Process 5 and 6 respond, telling 4 to stop
Global State (3)
d) Process 6 tells 5 to stop
A Ring Algorithm
Mutual Exclusion:
A Centralized Algorithm
a) Process 1 asks the coordinator for permission to enter a critical region. Permission is granted
b) Process 2 then asks permission to enter the same critical region. The coordinator does not reply.
A Distributed Algorithm
a) Two processes want to enter the same critical region at the same moment.
b) Process 0 has the lowest timestamp, so it wins.
A Toke Ring Algorithm
a)
An unordered group of processes on a network.
Comparison
A comparison of three mutual exclusion algorithms.
Algorithm Messages per entry/exit
Delay before entry
(in message times) Problems
Centralized 3 2 Coordinator crash
Distributed 2 ( n – 1 ) 2 ( n – 1 ) Crash of any process
The Transaction Model (1)
The Transaction Model (2)
Examples of primitives for transactions.
Primitive Description
BEGIN_TRANSACTION Make the start of a transaction
END_TRANSACTION Terminate the transaction and try to commit
ABORT_TRANSACTION Kill the transaction and restore the old values
READ Read data from a file, a table, or otherwise
The Transaction Model (3)
a) Transaction to reserve three flights commits
b) Transaction aborts when third flight is unavailable
BEGIN_TRANSACTION reserve WP -> JFK; reserve JFK -> Nairobi; reserve Nairobi -> Malindi; END_TRANSACTION
(a)
BEGIN_TRANSACTION reserve WP -> JFK; reserve JFK -> Nairobi;
reserve Nairobi -> Malindi full => ABORT_TRANSACTION
Distributed Transactions
a) A nested transaction
Private Workspace
a) The file index and disk blocks for a three-block file
b) The situation after a transaction has modified block 0 and appended block 3
Writeahead Log
a) A transaction
b) – d) The log before each statement is executed
x = 0; y = 0;
BEGIN_TRANSACTION; x = x + 1;
y = y + 2 x = y * y;
END_TRANSACTION; (a)
Log
[x = 0 / 1]
(b)
Log
[x = 0 / 1] [y = 0/2]
(c)
Log
[x = 0 / 1] [y = 0/2] [x = 1/4]
Concurrency Control (1)
Concurrency Control (2)
Serializability
a)
–
c)
Three transactions T
1, T
2, and T
3d)
Possible schedules
BEGIN_TRANSACTIONx = 0; x = x + 1;
END_TRANSACTION
(a)
BEGIN_TRANSACTION x = 0;
x = x + 2;
END_TRANSACTION
(b)
BEGIN_TRANSACTION x = 0;
x = x + 3;
END_TRANSACTION
(c)
Schedule 1 x = 0; x = x + 1; x = 0; x = x + 2; x = 0; x = x + 3 Legal
Schedule 2 x = 0; x = 0; x = x + 1; x = x + 2; x = 0; x = x + 3; Legal
Schedule 3 x = 0; x = 0; x = x + 1; x = 0; x = x + 2; x = x + 3; Illegal
Two-Phase Locking (1)
Two-Phase Locking (2)
Pessimistic Timestamp Ordering
Consistency and Replication
Object Replication (1)
How to protect the object againts simultaneous access by
multiple clients?
a) A remote object capable of handling concurrent invocations on its own.
Consistency problem when replicating a shared remote object
without taking any special measures regarding the handling of
concurrent invocation (Example : Synchronization problem
was replicated bank-account database)
a) A distributed system for replication-aware distributed objects.
Data-Centric Consistency Models
Strict Consistency
Behavior of two processes, operating on the same data item.
• A strictly consistent store.
• A store that is not strictly consistent.
Linearizability and Sequential Consistency (1)
a) A sequentially consistent data store.
b) A data store that is not sequentially consistent.
Sequential consistency : The Result of any execution is the same as if the (read & write) operation by all processes on the data store were executed in some sequential order and the operations of each individual process appear in this
sequence in the order specified by its program.
Linearizable : the result of any execution is the same as if the (read & write) operations by all processes on the data store were executed in some sequential order and the operations of each individual process appear in this sequence in the order specified by its program. In addition, if , then operation OP1(x) should precede OP2(y) in this sequence.
) ( )
( 2
1 x ts y
Linearizability and Sequential Consistency (2)
Three concurrently executing processes.
Process P1 Process P2 Process P3
x = 1;
print ( y, z);
y = 1;
print (x, z);
z = 1;
Linearizability and Sequential Consistency (3)
Four valid execution sequences for the processes of the
previous slide. The vertical axis is time.
x = 1;
print ((y, z); y = 1;
print (x, z); z = 1;
print (x, y);
Prints: 001011
Signature: 001011
(a)
x = 1; y = 1;
print (x,z); print(y, z); z = 1;
print (x, y);
Prints: 101011
Signature: 101011
(b)
y = 1; z = 1;
print (x, y); print (x, z); x = 1;
print (y, z);
Prints: 010111
Signature: 110101
(c)
y = 1; x = 1; z = 1;
print (x, z); print (y, z); print (x, y);
Prints: 111111
Signature: 111111
Casual Consistency (1)
Necessary condition:
Writes that are potentially casually
related must be seen by all processes
in the same order. Concurrent