Enhanced Mutual Serializability - Consistent Online Backup in Transactional File Systems

serialized after t_b (tx is an “after” transaction) and f ile_k has been copied to backup, since f ile_kis not accessed by any other user transaction after its deletion, themutual serializability protocol extends comfortably to ensure a serializableπ_C^h_b^h.

File Creation :

Creation of a file,f ilek(creat(f ilek, dj) for a regular file andmkdir(f ilek, dj) for a directory) under a directoryd_j requires an entry of the file name and its unique identifier (inode number in this case) in the directory d_j and allocation of an inode and its initialization for the created filef ile_k. Let the corresponding mapping offilecreation be{w_x(dj), wx(f ile_k)}.

Now, does mutual serializability extend naturally to support creation of files and hence a growing file system? The answer is no as suggested by the following example: As before, symbol “<” is used to denote “precedes” in any schedule and when the backup transaction is involved the symbol is “ << ”. Say, t_x << t_b. Hence t_b has read d_j before f ile_k was created. Now, let there be another transactionty inπ_C^h_b^h which accesses the newly created f ile_k and another file f ile_l, where f ile_l has not been read by t_b. Since t_y accesses f ile_k which was created by t_x, t_x < t_y holds. As both f ile_k and f ile_l have not been read by t_b, mutual serializability between t_b and ty should be ty << t_b. But, ty << t_b cannot hold because t_b will never readf ile_k, as it has been created “af ter t_b.

The following section delves into the issue of capturing a consistent file system backup in the presence of file create operations. We revisit the mutual serializability protocol and modify it to handle filecreates.

4.4. ENHANCED MUTUAL SERIALIZABILITY 41

operations based on which user transactions are serialized, now also includes, creat node- read,read-creat node,creat node-write and write-creat node conflicts. Aread by t_b conflicts with a creat node operation of a user transaction on the same file.

The mutual serializability definition now needs to be modified. Again, we redefinemu- tually serializable with the only difference from its original definition being the set of access operations which now includecreat node.

Mutually serializable: Let concurrently executing transactions t_x, t_y ∈ (T^h ∪ t_b) access(read,write or creat node) into the set of files Fx, Fy ∈ F^h respectively. Given, F_x∩F_y 6=∅,t_x and t_y aremutually serializable if

• t_x and t_y are both read-only transaction OR

• every pair of access operations ax(f ilei), ay(f ilei) where f ilei ∈ (Fx ∩Fy) and <

a_x, a_y > ∈ {< r_x, r_y >,< r_x, w_y >,< w_x, r_y >, < w_x, w_y >,< creat node_x, r_y >,<

r_x, creat node_y >,< creat node_x, w_y >,< w_x, creat node_y >}, is ordered either as

– {a_x(f ile_i)< a_y(f ile_i)|∀f_i∈(F_x∩F_y), OR

– {a_y(f ile_i)< a_x(f ile_i)|∀f_i∈(F_x∩F_y)

We see that while defining the modifiedmutually serializablerelationship between concurrently executing transaction pairst_xand t_y, ift_x<< t_y then the following sequence of operations cannot occur: wx(f ilei)<< creat nodey(f ilei) and rx(f ilei)<< creat nodey(f ilei).

Similarly if t_y << t_x, the following sequence of operations cannot occur: w_y(f ile_i) <<

creat node_x(f ile_i) and r_y(f ile_i) << creat node_x(f ile_i). It is simply because, a file cannot be read or written into before it is created. Given these basic definitions we observe that, π_Chb^hcan be serialized by keepingt_b mutually serializable with eacht_x∈T^hsimultaneously, provided there is noty ∈T^h such thattb << ty and creat nodey(f ilek) is in πy if there is a t_x∈T^h such thatt_xreads or writes intof ile_kandt_x << t_b. Thus, this gives rise to a modified version of the protocolmutual serializability. We state and prove that we get serializable schedules by using the modifiedmutual serializability protocol in Theorem 4.4.1. The proof of Theorem 4.4.1 is similar to the proof of Theorem 3.3.1, once we show that Lemma 3.3.1 continues to hold with the set of file access operations now including creat node and set of conflicting operations extended to include the conflicts with thecreat nodeaccess operation.

Theorem 4.4.1. Givenπ_Ch is a serializable schedule, π_Chb^h is serializable if ∀t_x∈T^h the following hold:

1. t_b, tx are mutually serializable and

2. if tx<< t_b, then there is no ty ∈T^h such that (a) t_b << ty, and

(b) there is acreat nodey(f ilek) in πy

Proof. We prove Theorem 4.4.1 by first showing that Lemma 3.3.1 continues to hold when condition 2 is added. Thus, we first show that the following statement holds:

A: t_b is mutually serializable with every t_x ∈ T^h, if t_b << t₁ < t₂ < ... < t_m, then tb << tm.

Since t_b is mutually serializable with tm, then either t_b << tm or tm << t_b. We need to show that only the former relationship is possible, given A. Let us consider the partial relationshiptb << t1< t2.

Let t1,t2 access (read ,write or creat node) the non-empty set of filesF1 and F2 respectively. Ast₁ < t₂, (F₁^TF₂)6=∅. There must exist at least one file on which the operations of t1 and t2 conflict. Without loss of generality, let this be a single file f1.

So, a1(f1)< a2(f1) must be in the schedule, where

< a1, a2 >∈ {< r₁, w2>, < w1, r2 >, < w1, w2>, < creat node1, r2 >, < creat node1, w2 >}.

(4.1) We need to consider two separate cases:

Case I:< a1, a2 >∈ {< r₁, w2 >, < w1, r2>, < w1, w2 >}

By definition,tbreads every file in the file system that has not been created during backup (and so f₁ also since it has not been created by t₁ ort₂; the case of another transactiont_z such thattb<< tz << t1 andtz createsf1, is covered by Case II below). Hencetb reads file f₁. Also by mutual serializability , a read byt_b conflicts with any access (read or write) by a user transaction to the same file.

Hence, given the relationship tb << t1,rb(f1)<< a1(f1) must be in the schedule, where

< r_b, a₁ >∈ {< r_b, w₁ >, < r_b, r₁ >}. (4.2) From 4.2 we get r_b(f₁) << a₁(f₁). From 4.1 we get a₁(f₁) < a₂(f₁). Therefore, by 4.1, 4.2 and the definition of mutual serializability, r_b(f₁) << a₂(f₁) holds. a₂(f₁) << r_b(f₁) cannot hold ast_b reads every file at most once.

Thus,

if t_b<< t₁ < t₂ then t_b << t₂. (4.3)

4.4. ENHANCED MUTUAL SERIALIZABILITY 43

Case II:< a1, a2 >∈ {< creat node₁, r2 >, < creat node1, w2 >}.

In this case, r_b(f₁) will not exist in the schedule ast_b << t₁ and file f₁ has been created by t1 which is after tb. Now, if t2 only accesses f1, then Fb intersect F2 is null and so by definition ofmutual serializability,t_bismutually serializablewitht2as there is no dependency betweent_b and t₂. So t_b<< t₂ holds.

If Fb intersectF2 is not null, and is say f2, then depending when tb reads f2,tb << t2

or t₂ << t_b will hold. But if t₂ << t_b holds, then it violates the second condition of the theorem wheret_x=t₂, and t_y =t₁. Therefore, even in this case,t_b << t₂ must hold.

It is to be noted that the arguments will continue to hold if F1 intersectF2 consists of more than one file and the conflicting operations are from any of the pairs of conflicting operations. Either Case I or Case II or both will have to hold, and the same arguments will apply.

It can be easily seen that the same arguments can be used to prove by induction that statementAcontinues to hold.

Having shown this, we now prove the theorem:

We use mathematical induction to show that SG(π_Chb^h) is acyclic and hence π_Chb^h is a serialized schedule.

We proceed with the knowledge thatπ_C^h is a serialized schedule and hence there cannot exist a cycle in SG(π_Chb^h) involving only transactions from the setT^h.

Base case 1: Let us consider a single transaction sayt₁∈T^h and the backup transac- tiont_b. Given thatt_b andt1 ismutually serializable, a cycle in SG(π_Chb^h) involving just the two transactionst_b and t₁, is not possible.

Base case 2: Let us now consider any two transactions{t₁, t₂} ∈T^h and t_b.

For a cycle to hold,tb must be part of the cycle. There is no cycle involving t1 andt2 as it is assumed that the operations oft₁ and t₂ are interleaved to form a serializable schedule.

Let there be a cycle,t_b << t₁< t₂ << t_b.

Lett1,t2access (read, write, creat node) the non-empty set of filesF1andF2respectively.

Ast₁ < t₂, then (F₁^TF₂) 6=∅. There must exist at least one file on which the operations of t₁ and t₂ conflict. Without loss of generality, let this be a single file f₁. So, a₁(f₁) <

a2(f1) must be in the schedule, where < a1, a2 >∈ {< r₁, w2 >, < w1, r2 >, < w1, w2 >, <

creat node₁, r₂ >, < creat node₁, w₂ >}.

We need to consider two separate cases:

Case I:< a1, a2 >∈ {< r₁, w2>, < w1, r2 >, < w1, w2 >,}

Now, by the definition oft_b,t_breads every file inF^hbat most once (and it readsf₁since it has not been created byt₁ ort₂; the case of another transactiont_z such thatt_b<< t_z << t₁

and tz creates f1, is covered by Case II below).

So rb(f1)occurs only once in the schedule π_C^h_b^h. (4.4) By definition of mutual serializability, sincet_b << t1,t_b reads f1 beforet1, i.er_b(f1)<<

a₁(f₁) is part of the schedule π_Chb^h.

Again by definition of mutual serializability, sincet₂ << t_b, t_b reads f₁ after t₂ and so a2(f1)<< rb(f1) is part of the schedule πCb.

But this implies, rb(f1) << a1(f1) < a2(f1) << rb(f1) is part of the schedule π_C^h_b^h. But, this is not possible by 4.4 above.

Therefore, there cannot be any cycle involving any two transactions and t_b. Case II: < a1, a2>∈ {< creat node₁, r2>, < creat node1, w2>}.

In this case, rb(f1) will not exist in the schedule as tb<< t1 and file f1 has been created by t₁ which is after t_b. Now, if t₂ only accesses f₁, then F_b intersect F₂ is null and so by definition ofmutual serializability tbismutually serializable witht2 as there is no dependency betweent_b andt₂. Sot_b << t₂ holds and there is no cycle involving two transactions andt_b. If F_b intersect F2 is not null, let this be a single file f2. For a cycle to exist t2 << t_b, but, this violates the second condition of the theorem wheret_x=t₂, andt_y =t₁. Therefore, even in this case, there is no cycle involving two transactions and t_b.

It is to be noted that the arguments will continue to hold if F1 intersect F2 consists of more than one file and the conflicting operations are from any of the pairs of conflicting operations. Either Case I or Case II or both will have to hold, and the same arguments will apply.

Induction hypothesis: Let there be m transactions, t1, t2, ..tm and the backup trans- actiont_b. Let there be no cycles present.

Induction step: Let there be m+1 transactionst₁, t₂, ...t_m, t_m+1and the backup trans- actiontb. For a cycle to exist, due to the induction hypothesis, all transactions must partic- ipate.

So, t_b<< t1 < t2.... < tm < tm+1 << t_b.

But by Aabove,t_b << tm and so the cyclet_b << tm< tm+1<< t_b must also exist. But a cycle involving two transaction andt_b is not possible as proved in the Base case.

Hence, there can be no cycle with m+1 transactions. So, t_b << t₁ < t₂.... < t_m <

tm+1 << tb is not possible.

Thus, Theorem 4.4.1 follows.

Dalam dokumen Consistent Online Backup in Transactional File Systems (Halaman 56-61)