• Tidak ada hasil yang ditemukan

Basic Structures

Dalam dokumen Text Information Retrieval Systems (Halaman 143-146)

The Physical Structure of Data

6.2.1 Basic Structures

The simplest record structure consists of a set of attribute values, stored one after the other, each of a predetermined, fixed size in number of bits or bytes. A record, if stored on disk, will normally be read into RAM as a unit, or as part of a larger unit. Sometimes a logical record will be stored as a number of smaller parts for the purpose of input–output operations. This is done with some word processors whose users see a document as a single record, but the physical reality may be a large number of smaller records.

To find any particular attribute within such a record, we must know, for each attribute, the location in memory of the first bit or byte, and the length of each attribute value or the location of its last bit or byte. There are other, equiv- alent ways of describing the same structural features.

If a record consists of name, address, telephone number, and date of last change, and if the length of each attribute value is fixed in advance, the pro- gram can easily find where the addressor telephone numberis located. This is called a fixed-length record. Complications set in when attribute values have a large range in terms of number of characters. For example, a person’s last or fam- ily name can range from two (NG) to many (IPPOLITOV-IVANOV). The abstract of a bibliographic record can range from a few bytes (ABSTRACT MISSING) to several

124

6 The Physical Structure of Data

Ch006.qxd 11/20/2006 9:55 AM Page 124

Attribute Number of Bytes

Content

name 11 SMITH, JOHN

address 32 1287 MAPLEST telephone 10 8085551234 date_of_last_change 6 031228

name 14 STEINBERG, MAE

address 27 327 MAIN ST telephone 10 8085554321 date_of_last_change 6 000116 Figure 6.1

Variable-length record: in this form, each attribute value carries with it an explicit tag showing its length. The nameand addressattributes vary widely in length; dateand telephonedo not.

Attribute Content Length (bytes) name SMITH, JOHN 10 account number 1234 567 890 12 no_transactions 003 3 Transactions

date 990624 6

type 2 1

amount 000005.45 8

date 000113 6

type 2 1

amount 000237.00 8

date 000115 6

type 3 1

amount 000100.00 8

Figure 6.2

Variable-length record: each field or attribute is of fixed length, but there can be a variable number of occurrences of the structure Transactions. The number of bytes for each field is explicitly given in the table and the number of transactions is given (no_transactions).

thousand. Although it is convenient to make each attribute value of fixed length, it can waste a great deal of memory to do so. One among many methods to per- mit variable-length attribute values, hence variable-length records, is to append to each attribute an explicit statement of its length as in Fig. 6.1. Here, the record is of variable length, but if we know the starting location and have these length tags or a table of lengths, it is easy to find any element within the record.

The same basic technique can be used if there may be more than one value of an attribute. Bank depositors may vary widely on number of transactions dur- ing a month, each of which must be represented in a record. A depositor record might be organized as shown in Fig. 6.2. Here, because the Transactionsarray has subordinate elements, we need another attribute showing the number of occurrences of a transaction, each of which consists of date, type, and amount,

6.2 Record Structures and Their Effects

125

each of which is shown with its length. Alternatively, we might show the total number of bytes used by the array of transactions or show both the number of array elements and their cumulative byte count. While the attributes of a bank transac- tion do not vary much in terms of length (most of us need distressingly fewer than eight digits to record the amounts of our deposits), in other kinds of records they could vary more. Variable-length arrays might contain variable-length fields.

Technically, a variable-length structure within a variable-length structure intro- duces nothing new, although, it can make finding the date of a deposit of $12,450 time consuming. (Why would one want to search a file using amount as a key?

One of the largest banking transactions one of us ever took part in involved an international transfer of funds “by wire.” Something went wrong with the trans- fer, and he was told by the bank staff the only way to trace it was by amount, and not account number, name of depositor, or name of recipient. A rather strange retrieval technique but one geared to a highly specialized application.)

Yet another organizational method, although logically equivalent, is to use pointers instead of counts of length. A pointer tells where the next element begins or the current one ends, as shown in Fig. 6.3.

A weakness of this structure is that we must follow chains of pointers from the beginning. We cannot jump into the middle of such a record structure, because we could not know what attribute we were dealing with. In this exam- ple, if a program found itself looking in location 163, it would not know the

126

6 The Physical Structure of Data

Content of a record assumed Comment to be stored beginning at

location 41.

(41) name/SMITH, JOHN/61 (41) indicates the storage address of the first byte of the attribute, containing, first, its name, then its value, then a pointer to the next attribute value. A separator is necessary.

(61) address/287 MAPLE AVENUE, AKRON, OH,44444/109

(109) Transactions/1/290 Transactions is a structure, consisting of several attributes, and there may be more than one set of them. The 1 following its name shows the number of occurrences, the 290 shows the address of the first one.

Note that the next attribute need not be contiguous to this one.

(290) date/040217/163 (163) type/2/312

(312) amount/00000545,116 The last attribute points to the start of the next record.

Figure 6.3

Variable-length record with pointers: pointers are used throughout to show where the next attribute or record begins, and separators are used to show where the names of attrib- ute, value, and pointer begin or end.

Ch006.qxd 11/20/2006 9:55 AM Page 126

next two bytes contained a transaction type, but if it started at location 41 and had reference to a table of attributes in this record, it would know, at each pointer, what the next attribute was going to be.

Dalam dokumen Text Information Retrieval Systems (Halaman 143-146)