• Tidak ada hasil yang ditemukan

Consistent Online Backup in Transactional File Systems

N/A
N/A
Protected

Academic year: 2023

Membagikan "Consistent Online Backup in Transactional File Systems"

Copied!
117
0
0

Teks penuh

A noticeable performance improvement was recorded when using performance-enhancing heuristics, redirecting the backup program to less active parts of the file system when detecting conflicts. The primary goal of this thesis is to investigate the problem and come up with an online backup solution that provides a consistent view of the file system being backed up.

Approach taken in the thesis

This leads us to assume a transactional file system as a way to specify consistency requirements while addressing the issue of a consistent online backup. Backing up the file system involves copying all the namespace information (directory contents) along with the data.

Contributions of the thesis

The approach taken in this thesis to back up a consistent file system is similar in concept to the approach taken by Pu. Finally, we have implemented a backup scheme in a transactional file system and shown that an efficient solution is feasible.

Organization of the thesis

  • Backup Storage Media
  • Data Repository Models And Management
  • Logical and Physical Backup Strategies
  • Backup Dataset Optimization and Security
  • Backup Storage Format

Dump [3], even though it reads data at the disk level, is considered a hybrid approach because it collects files and folders to back up with knowledge of the file system structure. The goal of our current study is to gain a consistent view of the file system, even though it is constantly changing.

Types of Data Consistency

Backup techniques that maintain file system-level consistency in the backup copy ensure the integrity of file system structures, including metadata, directory entries, and file data. Apoint-in-time consistent backup image, represents the state of a file system at a single moment in time [21, 17].

Traditional Backup Techniques

Tar

Dump

Issues in Using Traditional Backup Techniques for Online Backup

In scenarios where a file shrinks or grows even while being backed up, an incorrect file length may be written into the header of the file. Now, if the move happens during the scan phase of a multiple backup program like dump, file D may appear in two places or not exist at all in the file system.

Figure 2.1: File System State
Figure 2.1: File System State

Online Backup

Online File System Backup

The snapshot feature of the fast file system is implemented with the support of the soft update mechanism, which ensures the consistency of the snapshot. Incremental physical backup of the file system is facilitated by the temporal ordering of data on disk.

Online Database Backup

The serialization of the backup transaction with respect to the regular transactions is the core of the mentioned scheme. File system backup involves backing up the entire namespace information (directory contents) along with the data. To study the theoretical aspects of consistent online backup of the file system, we consider in our formal model a flat file system and a file system interface that provides transactional semantics.

Since read-only user transactions do not make any changes, they do not conflict with the backup transaction.

Basic Terminology

A pair of access operations ax(fi), ay(fj) ∈ πC is said to be inconsistent if they satisfy the following properties. A schedule πC is a serialized schedule if it is conflict equivalent to some serialized schedule of the set T. Formally, a schedule πC is a serialized schedule, if every pair of conflicting operations {ax(fi),ay(fi) } inπC is ordered as.

Mutual Serializability

A read and write Operation

A hierarchical file system, like most practical file systems, does not consist of a fixed set of files throughout its lifetime, ie. each file in a file system has a unique file identifier which identifies the file descriptor or inode of the file. A backup transaction plan for a hierarchical file system also consists of a sequence of read operations for each file in the file system, where a file is either a regular file or a directory.

Additionally, as with flat file systems, each file in the file system hierarchy is read at most once by a backup transaction.

The Formal Model

Mapping of Hierarchical File System Operations

The Backup Transaction’s read Operation

Operations Maintaining a Fixed Data Set

The link() function inserts the reference (new link) for fi intodj2 and the link count for the file is incremented by one through the update offi's inode. Of course, it can be the same as new in cases where the name of the file is changed instead of the full path to the file. For a regular file rename, neither the file nor its inode is changed in many implementations.

As already mentioned, reading and writing a file's metadata is a read and write to the file itself if the file and its metadata are considered a single entity.

Operations that Grow/Shrink the Data Set

Most inode operations can now be mapped directly to either area or write of the file, and we will only look at the mapping of a few inode operations here. Now, mutual serialization naturally extends to support the creation of files and thus a growing file system. Since both f ilek and f ilel have not been read by tb, mutual serializability between tb and ty should be ty << tb.

The next section takes a closer look at the issue of creating a consistent file system backup in the presence of file creation operations.

Enhanced Mutual Serializability

In this case, rb(f1) will not exist in the schedule astb << t1 and the file f1 was created by t1 which is behind tb. But if t2 << tb holds, then it violates the second condition of the theorem where tx=t2 and ty =t1. In this case rb(f1) will not exist in the schedule because tb<< t1 and the file f1 was created by t1 which is after tb.

For the existence of a cycle t2 << tb, but this violates the second condition of the theorem where tx=t2 and ty =t1.

Path Lookups

Realizable Schedule

  • Transactional Model
  • Module Architecture and Implementation
  • Transactional Handling of File Operations
  • Application Programmers Interface to TxnFS
  • Contributions and Limitations of TxnFS

The atomicity property requires that all or none of the changes made by a transaction be applied to the file system. We use the term deferred write to refer to the technique of writing dirty data to the file system only at commit time. The log manager facilitates the implementation of the strict 2PL concurrency control protocol by performing deferred write operations from the log to the file system after the transaction has committed and the log file is forced to disk.

Only upon successful locking of the parent directory, the creation request is processed by the Update manager by instructing the Log manager to log the call and request an in-place creation of the file in the child file system at the time of capturing.

Figure 5.1: Top Level TxnFS Architecture
Figure 5.1: Top Level TxnFS Architecture

Implementation of Consistent Online Backup

  • Conceptual Overview
  • Implementation Details
  • Uniformly Random Pattern(global)
  • Spatial Locality Access Pattern(local)

Upon subsequent successful locking of a file, the user transaction verifies that it is still mutually serializable with respect to the backup transaction by comparing the file's read bit with its own before-after bit. Thus, the backup utility "sets" the sticky bit after successfully reading a file to the backup media. In the current section, we show that the implemented mutual serialization protocol ensures backup consistency and is therefore correct.

The backup represents the file system after the “move” and all three files are clearly copied after being written.

Table 5.1: Establishing Mutual Serializability.
Table 5.1: Establishing Mutual Serializability.

Experimental Setup

Simulation Results and Performance Analysis

Conflict Percentage under different workloads

In the current section, the simulation results are presented and analyzed in terms of the percentage of user transactions that conflict with the backup transaction under the different workloads. See Figure 6.1 and the accompanying Table 6.1 for the percentage of transactions that conflict with the backup transaction in each of the different workloads. This deduction is confirmed when we see a sharp drop in the percentage of transactions that conflict with the backup transaction in the rest of the workload sets, all modeled to show the location of access.

Simulation results of the 50% share-hot-cold workload group report that 6% of user transactions conflict with the backup transaction.

Table 6.1: Change in Conflicts Between The MS disabled and MS enabled Run.
Table 6.1: Change in Conflicts Between The MS disabled and MS enabled Run.

Backup Time

Another interesting and encouraging result is evident from the simulation result of the statistical workloads (both 50%share-stat and 0%share-stat). In terms of sharing, the 50%share-stat(or 0%share-stat) is similar to the 50%share(or 0%share) workload, but it has a much higher percentage of stat() file system calls and therefore accesses. This explains the shorter backup time in both the MS-enabled and MS-disabled runs of the statistical workloads compared to the 50%share (or 0%share) workload.

Contention occurs only when the backup accesses current "hot" regions of the file system hierarchy and.

Figure 6.2: Duration Of Backup For Each Set Of Workload
Figure 6.2: Duration Of Backup For Each Set Of Workload

Overall Throughput

This large difference is due to more than 50% of the user transactions colliding with the backup in the global workload as seen in Figure 6.1. For example, throughput drops from 7.5% in the 0%share to 2.3% in the 50%share workload under MS enabled. This analysis is recovered by the decrease in throughput with increase in division in the corresponding runs under MS disabled.

We also observe an increase in the overhead (difference between MS disabled and MS enabledrun) of taking a stable backup as the inter-transactional partition increases.

Performance Improvement

Algorithm and Implementation

Now, when a mutual serializability conflict is detected, the backup transaction is redirected to another part of the filesystem tree and this is practically implemented by the backup transaction now switching to another stack for its traversal help. The point it traversed to in the subtree from which it was just redirected is stored on top of the stack corresponding to that subtree and thus the backup transaction is never "lost". The backup transaction completes reading a subtree in depth-first fashion before moving to the next subtree if it is not redirected before completion.

When the backup transaction is redirected, it traverses the subtree whose stack is next in the circular list.

Simulation Result and Performance Analysis

The reduction in the percentage of conflicts in 0%share-hot-cold, even though there were no further conflicts with concurrent user transactions in the same place, is due to the fallback transaction being redirected to a place much further away from the current one, thus avoiding conflicts , which may arise in nearby places. See Figure 6.5 where we compare the backup time for running MS enabled (no heuristics), running MS disabled, and running MS enabled (heuristics). We see a reduction (by 2.6% at 50%share-hot-cold workload and 2.1% at 0%share-hot-cold) in backup time when comparing execution (heuristics) with MS enabled vs. MS (no heuristics) runes.

In addition, the backup time is also reduced by reducing conflicts, thereby contributing to increased throughput.

Figure 6.4: Conflict Percentage On Applying Heuristics To Improve Performance
Figure 6.4: Conflict Percentage On Applying Heuristics To Improve Performance

Discussion

But a backup can be inconsistent if it is created randomly while a file system is active. This thesis investigated the problem of consistent online backup in a file system that supports transactions. On this transactional file system platform, the backup transaction was implemented and is mutually established.

More work is needed to formally demonstrate this and implement this scheme in an existing file system such as ext4.

File System State

Inconsistent File System State In Backup

Top Level TxnFS Architecture

Modular Architecture of TxnFS

Percentage Of Transactions Conflicting In Each Set Of Workload

Duration Of Backup For Each Set Of Workload

Throughput Of User Transaction(Measured for the duration the backup was

Conflict Percentage On Applying Heuristics To Improve Performance

Backup Time On Applying Heuristics To Improve Performance

Throughput On Applying Heuristics To Improve Performance

Gambar

Figure 2.1: File System State
Figure 5.1: Top Level TxnFS Architecture
Figure 5.2: Modular Architecture of TxnFS
Table 5.1: Establishing Mutual Serializability.
+7

Referensi

Dokumen terkait

ON RELIGIOUS DRESS Sacred Congregation for Religious and Secular Institutes This Sacred Congregation has been receiving reports from various countries that religious men and women, in