Lists - The Physical Structure of Data - Text Information Retrieval Systems

The Physical Structure of Data

6.4.3 Lists

Some file applications may call for the opposite of the ideal indexed structure. Consider a hospital with a relatively rapidly changing population of patients, each of whom is likely to have a different medical history, set of laboratory tests, drugs prescribed, attending physician, and therapies. There are likely to be several changes to each record daily, but relatively little searching. Because of the large number of possible attributes of any patient, a desirable file structure would allow entry of only those laboratory and therapy reports that are needed for each individual patient. This is preferable to reserving a fixed number of storage locations for each patient, held for possible future use. The laboratory-type data would be entered as separate records, linked to the appropriate patient.

One method of linking is to use pointers between records. Figure 6.8 shows a sequential file, to which, has been added a pointer to the next record as an attribute of each record. Next, in this case, means the record with the next higher value of the sort key. If this is done, then there is no longer a need to keep records in physical order by sort key, because the physical order of storage no longer indicates the logical order of key values.

In some applications it may be useful to know what record pointed to any given record. For this purpose, a second series of pointers can be established, so that one series points forward to the next higher key value and one points backward to the previous, or next smaller, key value. These are then called forward and backward pointers, respectively.

This is the general structure of a list. In computer science the term has a special meaning and does not denote simply an array. Typically, two or more lists are maintained simultaneously. One contains the data, and one contains information about empty or unused space in the area of memory set aside for future additions to the file. Here is a brief example of how it might work.

To start with, before any data are put into the file, set aside an area such as shown in Fig. 6.9, here containing 10 record positions, each with an address (1, 2, . . . , 10) and initially set up each with a pointer to the next sequential record position, except for the last, whose zero value is a code indicating the end of the list. A directory, a small file outside the main file space, tells where the first data record is to be found and where the next free record space is to be found.

6.4 Organizational Methods

133

Initially, there are no data records, so this condition is designated by a 0 in the data directory and a 1 in the empty space directory to point to the first record as the first available space.

Now, add the first data record (Fig. 6.10). It has a key of 324. We put it (a) in position 1, indicated by the pointer in the free space directory, take the pointer from that record and put it (b) in the free space directory (c). We also record the address of the first record in the data directory (d) and change the pointer in the first data record (e) to 0, indicating that there is no next data record. So far, this has been a great deal of work to place a single record in a file.

Now (Fig. 6.11) we add a second record. It has a key of 287 which means it logically precedes number 324. So we want to put the new record first logically, but since the first physical record position is now in use, we do not want to have to move the incumbent. So, we look (a) to the data directory, which points to physical record position 1, in which is found a key of 324. The new record key is smaller so we take the pointer from the next available space pointer in the directory and place it (b) in the directory as first data record. Hence, the current next free space is to be the logically first data record. Then put the new record (c) in position 2, formerly pointed to be the free directory. This new record should point to the record with key value 324, so the old first record pointer (1) is placed (d) in the pointer of record 2. The old pointer to the next free space,

134

6 The Physical Structure of Data

123 45 6789

. . .

345 67 8901

456 78 9012 234 56 7890

Record No.

1 2

Figure 6.8

Pointers as a means of indicating record order: the records of this file are sequentially organized, but are not in order by a key value. New records are placed wherever there is space available. However, one of the attributes is a pointer to the record with the next higher key value. The logical sequence of records in this file would be records at locations 3, 1, 2, 4, and 18.

Ch006.qxd 11/20/2006 9:55 AM Page 134

in record position 2, goes to the next available space pointer (e). And that is all there is. No changing of record positions, but we have a file that is “in order”

by key.

We can keep adding new records, in any order, but the work required is essentially the same as for the second record. Perhaps now the advantage of the method is becoming apparent. To delete a record, which is shown in Fig. 6.12, move the pointer value of the deleted record (a) to the pointer of the preceding record, move the pointer value (b) in the free space directory into the deleted record, and place the pointer to the deleted record address (c) from record 2 into the free-space directory. That is all—no moving of records. The deleted data remains in its space but the space is now made available for new data to be writ- ten over it. By this procedure deleted record space will be re-used when the next record is added to the file.

By using multiple pointers in a record, any record can point to several next ones, each using a different sense of “next.” This is the essence of the multilist file structure (Lefkovitz, 1969). It is the physical analog of hypertext. When consid- ering any record, there may be a variety of “next” records to go to. There might be a basic patient record, as in Fig. 6.13, containing primarily identification data,

6.4 Organizational Methods

135

1 2 3 4 5 6 7 8 9 10

2 3 4 5 6 7 8 9 10

0 1 Directory

First data record Next available space

Initial settings of pointers within empty records Figure 6.9

Initial set-up of a list structure: each record position, except the last, has a pointer to the next physical record, all being initially empty. The directory shows, by the symbol 0, that there is no first data record and that position 1 is the first available space.

then a series of pointers to laboratory results, drugs prescribed, therapy prescribed and taken, etc. The first laboratory record might point to a second, and so on.

The result is a file with a completely variable number of components of each patient’s logical record, with new component entries being added as needed and older ones dropped when no longer needed.

The advantages of this method are its flexibility and comparatively modest space requirements. We can not only change the content of such a file with rel- ative ease and speed, but we can even change its logical structure (variable num- bers of subordinate records, such as the lab reports referred to above). Pointers individually do not take up much space, but if we use many of them, the space cost can become significant. The disadvantage is that search speed may be quite slow. This is another space-time trade-off.

Dalam dokumen Text Information Retrieval Systems (Halaman 152-155)