File Organization
•
Files are composed of records
•
We need to know the arrangement of records within a file .
Record Format:
•
All files are composed of records.
•
When a user gives a command to modify the contents of a file, it’s actually a command to access records within the file
•
Format: They can be of fixed length or of
variable length, as shown in Figure 8.6. And
these records, regardless of their format, can
be blocked or not blocked
◦ The most common Format
◦ They’re the easiest to access directly( ideal for data files)
◦ The critical aspect of fixed-length records is the size of the record.
◦ If it’s too small: The leftover characters are truncated
◦ If it’s too large :storage space is wasted
1 Muthu
l
10.05.
19
Trichy 20,00 0
67543 33
Don’t leave empty storage space and don’t truncate any characters
Eliminates the two disadvantages of fixed-length
But they can easily be read (one after the other), they’re difficult to access directly
Because it’s hard to calculate exactly where the record is located.
They’re used most frequently in files that are likely to be accessed sequentially(text files ,program files or files that use an index to access their records)
The record format, how it’s blocked, and other related information is kept in the file descriptor.
1 Muthu lakshmi
10.05.1 985
Trich y
20,00 0
675433322 1
• The way records are arranged and the
characteristics of the medium used to store it.
• On magnetic disks (hard drives), files can be organized in one of several ways:
– Sequential – direct
– indexed sequential
• To Select the best Consider the characteristics:
1) Volatility of the data:the frequency with which additions and deletions are made
2) Activity of the file—the percentage of records processed during a given run
3)Size of the file
4) Response time—the amount of time the user is willing to wait before the requested operation is completed
Easy to implement
Records are stored and retrieved serially
To find a specific record, the file is searched from its beginning until the requested record is found
Optimization features may be built into the system(To speed up the process)
◦ select a key field from the record and then sort the records by that field before storing them
◦ system searches only the key field of each record in the file
◦ search is ended :either an exact match is found or record not found
◦ complicates file maintenance
◦ the original order must be preserved every time records are added or deleted
◦ file must be completely rewritten or maintained in a sorted fashion every time it’s updated
uses direct access files
can be implemented only on direct access storage devices
Flexibe to access any record
It’s also known as “random organization,”
and its files are called “random access files
Records are identified by their relative
addresses :their addresses relative to the beginning of the file.
These logical addresses are computed
when the records are stored and then
again when the records are retrieved.
• Relative address record identification:
The user identifies a field in the record format and designates it as the key field (Unique)
The program used to store the data follows a set of instructions, called a hashing algorithm
That transforms each key into a number: the record’s logical address.
Logic Address File Manager Translate Logical address to physical address (cylinder, surface, and record numbers)
preserves the file organization
The same procedure is used to retrieve a record
◦ Advantages
Fast record access
Sequential access if starting at first relative address and incrementing to next record
Updated more quickly than sequential files(rewritten to the original after modification)
No need to preserve the record order
Adding, deleting records is quick(EX: Telephone Mail)
◦ Disadvantages
Hashing algorithm collision: several records with unique keys may generate the same logical address
(figure 8.6) The hashing algorithm causes a collision. Using a combination of street address and postal code, it generates the same logical address (152132737) for three different records.
© Cengage Learning 2014
Indexed sequential record organization
◦ Combibes the best of sequential and direct access
◦ Created and maintained through Indexed
Sequential Access Method (ISAM) Application
◦ It removes the burden of handling overflows and preserves record order
◦ Advantage: no collisions (no hashing algorithm)
Uses this info to Generate index file for record retrieval
Divides ordered sequential file into equal sized blocks (size determined by FM)
Each entry in index file contains the highest
record key and physical data block location
To access any record in the file, the system begins by searching the index file
Then goes to the physical location indicated at that entry
Overflow area’s: Throughout the file &
located apart from the main data area(2 places)
Dynamic files, indexed sequential is the organization of choice
Because it allows both direct access to a
few requested records and sequential
access to many
The File Manager must work with files (not
whole units but also as logical units or records)
Records within a file must have the same
format, but they can vary in length( Figure 8.8)
If records are subdivided into fields - their
structure is managed by application programs and not the operating system.
An exception is made for those
systems( database applications) , the File Manager handles field structure.
File Storage: Record Storage
Stored one after the other
Simple to implement and manage
Advantage:
◦
Once its starting address and size are known:
we found the record easily
◦
Ease of direct access : every part of the file is stored in the same compact area
Disadvantage:
◦
File can’t be expanded unless there’s empty space available immediately after it
◦
Fragmentation
Adjacent empty areas can be combined into one
large free space.
Allows files to use any storage space available on the disk
A file’s records are stored in a contiguous manner, only if there’s enough empty space
Any remaining records and all other additions to the file are stored in other sections
In some systems these are called the extents of the file and are linked together with
pointers
The physical size of each extent is determined by the
OS and is usually 256—or another power of two—
bytes.
File Extend:
Linked in 2 ways
◦ Storage Level :where each extent points to the next one in the sequence
◦ Directory Level: consists of the filename, the
storage location of the first extent, the location of the last extent, and the total number of extents not counting the first
(figure 8.10)
Noncontiguous file storage with linking taking place at the storage level. File 1 starts in address 2 and continues in addresses 8, 20, and 18.
The directory lists the file’s starting address, ending address, and the number of extents it uses. Each block of storage includes its address and a pointer to the next block for the file, as well as the data itself.
Both noncontiguous allocation schemes eliminate external storage fragmentation and the need for compaction
They don’t support direct : not easy to determine the exact location of a specific record
File Manager can select most efficient method of storage allocation:
◦ contiguous for direct files
◦ noncontiguous for sequential
◦ Os manages both schemes of storage
allocation
Allows direct record access by bringing together the pointers linking every extent of that file into an index block.
Every file has its own index block
It consists of the addresses of each disk sector that make up the file.
The index lists each entry in the same order in which the sectors are linked(8.12)
When a file is created, the pointers in the index block are all set to null.
Then, as each sector is filled, the pointer is set to the appropriate sector address—to be precise
The address is removed from the empty space list and copied into its position in the index block
supports both sequential and direct access and necessarily
improve the use of storage space