Indexed Sequential Access Method - The Physical Structure of Data

The Physical Structure of Data

6.6.3 Indexed Sequential Access Method

Like the previous organizations, the indexed sequential access method (ISAM) is a family of methods. The essence of all ISAM variations (Fig. 6.20) is that a main file is created whose records are in order on a key and stored on disks such that the disk sector containing a given record can be directly accessed if the sector number is known (Korth and Silberschatz, 1986, pp. 266–274). An index, possibly nested, provides the location of the appropriate sector. The specific record will then have to be found within the sector by a sequential search, but this process takes little time if the number of records per sector is small. When the main file is created, it may have empty space built in, to allow for the later addition of records without the need to recopy the file. In case all the empty space in a sector is used up, overflow areas are set aside, and a list method is used to connect the main file with the overflow area, and individual records with one another within the overflow area.

Thus, there are three major parts of an ISAM file, actually three or more interrelated files—Main, Index, and Overflow. Because of the nests of indexes and overflow areas in the main file, more space has to be devoted to directories than in other organizations. These point to the beginning of files and subdivi- sions of files. They add to the overhead cost of the method. But on average, ISAM is reasonably efficient of storage and effective in terms of speed. It is a commonly used method for these reasons.

6.6 Combination Structures

147

9 0,8 0,9

1,6 8

0,11 1,17 7 10

1,0 6

17 18

1,14 1,15 1,16

1 3 4 5

11 12 13 14

19 20

21 22 23 24

26 27 28

25 30

0,2 0,5

1,12 1,13

1,19 0,18 0,19 0,20 0,21

0,27 0,22

0,0

29 0,7

15 2

0,2 1,0 1,12 1,19

0,0 0,4 0,4

0,22 0,23 0,24 0,25 0,26 0,27 0,28 0,29 0,30 0,31

31 16

0,28 0,29 0,30 0,31 0,23 0,24 0,25 0,26 1,13 1,14 1,15 1,16 1,18 1,0 0,20 0,21 0,8 0,9 0,11 1,17

1,0 0,5 0,7

Figure 6.19

Direct file structure with pointers: the figure on the left shows the status of a file to which a new record, whose key hashes to 10, is to be added, That location is in use, but points to 17, which is also in use but is the end of the chain. Hence, put the new record in the next free space (18), the pointer in 17 will still point to 18, but the status code shows 17 is not the end of the chain, as shown in the right-hand diagram. The new record is now in location 18 and its status indicator shows it as the end of a chain of records that origi- nally hashed to 10. Note a second chain going from 3 to 6.

6.7 Summary

For any file, we have a choice between placing records in random order (or arrival order) and putting them in order according to a key. Random order means that the location of the record is not determined by the value of an attrib- ute in the record. Such a method is fast at placing the record in memory, but may render searching impractically slow. Think of a library with random order- ing of the books on its shelves. This could work for a small, personal library where the owner knows every book and its location by heart, but it would be impossible for a major library. It is, therefore, a technique almost unheard of in practical use for large computer files.

It is far more common to order or sequence records on the basis of the value of one or more attributes, collectively known as the sort key. This gives two advan- tages: if we know the key, we can usually quickly find at least the approximate location of the record, thus saving considerable search time compared with having to search every record. Also, we can make use of sequential searching when it is to our advantage. Sometimes, we might like to start at a given key, such as a fragment

148

6 The Physical Structure of Data

Nested indexes point to segments of the Main File

The Main File. Each sector (a row in this illustration) contains an index, to the left of the heavy line, and a set of records. The index gives the highest key value of records in its sector and the address of the first of any overflow records.

Overflow area. Records that logically belong in the Main File, but for which there is no room, are placed here in any of several arrangements. They may all be in one structure, chained together, or there may be a separate list for each sector from which they overflow.

Figure 6.20

The indexed sequential file structure: such a structure has, at its heart, a sequential file, an index or nest of indexes, and an area for storing overflow records from the Main file.

Ch006.qxd 11/20/2006 9:55 AM Page 148

of an author’s name, and then search in sequence until a name is reached that does not contain the fragment. For example, we might want to look for an author whose name begins with TCHAI. To do this, find the first name that begins with this string, then search sequentially until a name is reached that does not begin with this string.

We have established the concept that the next record may not necessarily be the next one in physical position in memory, but the next one in terms of key value.

In order to find records, given a key, we can use either an index or a com- putational method. The latter method is faster, by far, but the former preserves the ability to search sequentially when we want to search on the basis of more than one key.

The file structures in use with most database software make use of combi- nations of the various methods we have surveyed. Largely, this is because different files and usage patterns place different demands on file structures. To avoid a different structure for each file, we tend to compromise on some aspects in order to achieve good performance on most of them.

6.7 Summary

149

This page intentionally left blank

151

7

Dalam dokumen Text Information Retrieval Systems (Halaman 166-170)