PPT FILE MANAGEMENT - Bharathidasan University

(1)

(2)

(3)

File Organization

(4)

•

Files are composed of records

•

We need to know the arrangement of records within a file .

Record Format:

•

All files are composed of records.

•

When a user gives a command to modify the contents of a file, it’s actually a command to access records within the file

•

Format: They can be of fixed length or of

variable length, as shown in Figure 8.6. And

these records, regardless of their format, can

be blocked or not blocked

(5)

◦ The most common Format

◦ They’re the easiest to access directly( ideal for data files)

◦ The critical aspect of fixed-length records is the size of the record.

◦ If it’s too small: The leftover characters are truncated

◦ If it’s too large :storage space is wasted

1 Muthu

l

10.05.

19

Trichy 20,00 0

67543 33

(6)

 Don’t leave empty storage space and don’t truncate any characters

 Eliminates the two disadvantages of fixed-length

 But they can easily be read (one after the other), they’re difficult to access directly

 Because it’s hard to calculate exactly where the record is located.

 They’re used most frequently in files that are likely to be accessed sequentially(text files ,program files or files that use an index to access their records)

 The record format, how it’s blocked, and other related information is kept in the file descriptor.

1 Muthu lakshmi

10.05.1 985

Trich y

20,00 0

675433322 1

(7)

• The way records are arranged and the

characteristics of the medium used to store it.

• On magnetic disks (hard drives), files can be organized in one of several ways:

– Sequential – direct

– indexed sequential

• To Select the best Consider the characteristics:

1) Volatility of the data:the frequency with which additions and deletions are made

2) Activity of the file—the percentage of records processed during a given run

3)Size of the file

4) Response time—the amount of time the user is willing to wait before the requested operation is completed

(8)

 Easy to implement

 Records are stored and retrieved serially

 To find a specific record, the file is searched from its beginning until the requested record is found

 Optimization features may be built into the system(To speed up the process)

◦ select a key field from the record and then sort the records by that field before storing them

◦ system searches only the key field of each record in the file

◦ search is ended :either an exact match is found or record not found

◦ complicates file maintenance

◦ the original order must be preserved every time records are added or deleted

◦ file must be completely rewritten or maintained in a sorted fashion every time it’s updated

(9)



uses direct access files



can be implemented only on direct access storage devices



Flexibe to access any record



It’s also known as “random organization,”

and its files are called “random access files



Records are identified by their relative

addresses :their addresses relative to the beginning of the file.



These logical addresses are computed

when the records are stored and then

again when the records are retrieved.

(10)

• Relative address record identification:

 The user identifies a field in the record format and designates it as the key field (Unique)

 The program used to store the data follows a set of instructions, called a hashing algorithm

 That transforms each key into a number: the record’s logical address.

 Logic Address File Manager  Translate Logical address to physical address (cylinder, surface, and record numbers)

 preserves the file organization

 The same procedure is used to retrieve a record

(11)

◦ Advantages

 Fast record access

 Sequential access if starting at first relative address and incrementing to next record

 Updated more quickly than sequential files(rewritten to the original after modification)

 No need to preserve the record order

 Adding, deleting records is quick(EX: Telephone Mail)

◦ Disadvantages

 Hashing algorithm collision: several records with unique keys may generate the same logical address

(12)

(figure 8.6) The hashing algorithm causes a collision. Using a combination of street address and postal code, it generates the same logical address (152132737) for three different records.

© Cengage Learning 2014

(13)



Indexed sequential record organization

◦ Combibes the best of sequential and direct access

◦ Created and maintained through Indexed

Sequential Access Method (ISAM) Application

◦ It removes the burden of handling overflows and preserves record order

◦ Advantage: no collisions (no hashing algorithm)

 Uses this info to Generate index file for record retrieval

 Divides ordered sequential file into equal sized blocks (size determined by FM)

 Each entry in index file contains the highest

record key and physical data block location

(14)



To access any record in the file, the system begins by searching the index file



Then goes to the physical location indicated at that entry



Overflow area’s: Throughout the file &

located apart from the main data area(2 places)



Dynamic files, indexed sequential is the organization of choice



Because it allows both direct access to a

few requested records and sequential

access to many

(15)



The File Manager must work with files (not

whole units but also as logical units or records)



Records within a file must have the same

format, but they can vary in length( Figure 8.8)



If records are subdivided into fields - their

structure is managed by application programs and not the operating system.



An exception is made for those

systems( database applications) , the File Manager handles field structure.



File Storage: Record Storage

(16)



Stored one after the other



Simple to implement and manage



Advantage:

◦

Once its starting address and size are known:

we found the record easily

◦

Ease of direct access : every part of the file is stored in the same compact area



Disadvantage:

◦

File can’t be expanded unless there’s empty space available immediately after it

◦

Fragmentation



Adjacent empty areas can be combined into one

large free space.

(17)



Allows files to use any storage space available on the disk



A file’s records are stored in a contiguous manner, only if there’s enough empty space



Any remaining records and all other additions to the file are stored in other sections



In some systems these are called the extents of the file and are linked together with

pointers



The physical size of each extent is determined by the

OS and is usually 256—or another power of two—

bytes.

(18)

File Extend:



Linked in 2 ways

◦ Storage Level :where each extent points to the next one in the sequence

◦ Directory Level: consists of the filename, the

storage location of the first extent, the location of the last extent, and the total number of extents not counting the first

(19)

 (figure 8.10)

Noncontiguous file storage with linking taking place at the storage level. File 1 starts in address 2 and continues in addresses 8, 20, and 18.

The directory lists the file’s starting address, ending address, and the number of extents it uses. Each block of storage includes its address and a pointer to the next block for the file, as well as the data itself.

(20)



Both noncontiguous allocation schemes eliminate external storage fragmentation and the need for compaction



They don’t support direct : not easy to determine the exact location of a specific record



File Manager can select most efficient method of storage allocation:

◦ contiguous for direct files

◦ noncontiguous for sequential

◦ Os manages both schemes of storage

allocation

(21)

 Allows direct record access by bringing together the pointers linking every extent of that file into an index block.

 Every file has its own index block

 It consists of the addresses of each disk sector that make up the file.

 The index lists each entry in the same order in which the sectors are linked(8.12)

 When a file is created, the pointers in the index block are all set to null.

 Then, as each sector is filled, the pointer is set to the appropriate sector address—to be precise

 The address is removed from the empty space list and copied into its position in the index block

 supports both sequential and direct access and necessarily

 improve the use of storage space

(22)