• Tidak ada hasil yang ditemukan

Spreadsheet Files

Dalam dokumen Text Information Retrieval Systems (Halaman 137-142)

Models of Virtual Data Structure

5.4.2 Spreadsheet Files

Spreadsheets have become a popular tool. Although they are called matri- ces, they are more strictly an array of structures or tuples, or a structure of arrays.

That is, every element need not represent the same attribute and may have con- siderably different characteristics from others in the same tuple. A spreadsheet is a set of data organized into rows and columns, and may contain accounting data as in Fig. 5.8, showing information about items sold in a retail store: product, product class, price, number sold, tax rate, gross receipts, and sales tax due. The last item is the amount of sales taxes collected and to be paid to the government. The tax rate is assumed to be 8% on accessories, 5% on clothing.

Spreadsheet software makes it quite simple to enter data into the array and to change it once there. Spreadsheet systems differ from almost any other file structure in that a data element (cell of a matrix) may contain values or mathe- matical formulas for computing the values. Values, in turn, may be data or labels, which might be the names of the attributes.

Figure 5.9 shows some of the content of this spreadsheet, cell by cell. Note that cell A7 is a label or the name of an attribute. Cell E7 has conditional value:

it is 0.05 if the item is clothing, and 0.08 otherwise. The content of F7 is a for- mula, the definition of a relationship among values. For the values to be shown, as in Fig. 5.8, the formula is replaced with its value. Because the formulas usu- ally take up more space than the values, the formula version of the spreadsheet is typically printed as a sequential file, as shown here.

In effect, a spreadsheet is two files, one representing attribute relationships and one representing attribute values. The first file, which is logically structured

118

5 Models of Virtual Data Structure

A B C D E F G

1 ABC Company, Inc.

2 Sales, Receipts, and Sales Tax Due 3

4 Product Product Price Number Tax Gross Sales

5 Class Sold Rate Receipts Tax Due

6

7 Shirts CL 23.50 112.00 0.05 2763.60 131.60

8 Ties CL 18.75 23.00 0.05 452.81 21.56

9 Hosiery CL 5.98 437.00 0.05 2743.92 130.66 10 Shoes, Pr. CL 83.00 26.00 0.05 2265.90 107.90 11 Wallets AC 15.00 14.00 0.08 226.80 16.80 12

13 TOTALS 8453.04 408.53

Figure 5.8

A spreadsheet file: this array showsproduct, product class, price, number sold, tax rate, gross receipts, and sales tax due. Letters across the top identify columns, and numbers on the left identify rows. These may be used in algebraic formulas. The sales tax rate is computed as a function of product class. Gross receiptsare computed as price times number soldtimes (1+ tax rate). Sales tax owedis the amount collected as tax and due to the government.

Ch005.qxd 11/20/2006 9:55 AM Page 118

5.4Applications of the Basic Models

119

A B C D E F G

1 ABC Company, Inc.

2 Sales, Receipts, and Sales Tax Due 3

4Product Product Price Number Tax Gross Sales

5 Class Sold Rate Receipts Tax Due

6

7Shirts CL 23.50 112.00 =IF(B7="CL",0.05,0.08) =(C7+D7)*(1+E7) =C7*D7*E7 8Ties CL 18.75 23.00 =IF(B8="CL",0.05,0.08) =(C8+D8)*(1+E8) =C8*D8*E8 9Hosiery CL 5.98 437.00 =IF(B9="CL",0.05,0.08) =(C9+D9)*(1+E9) =C9*D9*E9 10 Shoes, Pr. CL 83.00 26.00 =IF(B10="CL",0.05,0.08) =(C10+D10)*(1+E10) =C10*D10*E10 11 Wallets AC 15.00 14.00 =IFB11="CL",0.05,0.08) =(C11+D11)*(1+E11) =C11*D11*E11 12

13 TOTALS =SUM(F7:F11) =SUM(G7:G11)

Figure 5.9

Actual content of a spreadsheet’s cells: note that cells F7–F11 contain a formula in essentially algebraic language, using cell names as variables. Cells E7–E11 contain a conditional statement, yielding the value 0.05 if product classis CL, and 0.08 otherwise. Cells F13 and G13 contain a function that computes the sum of a column of numbers. Cells in column A are labels such as SHIRTS.

the same way as the second, contains instructions on computing values, such as illustrated above, or it could simply provide a value and instructions to copy the given value. The second file contains the values, whether directly provided by a user or computed by formula. A spreadsheet is the only commonly used data structure that contains instructions for changing its own values as an integral component, although word processors also embed commands in the text, for example, to change type font.

5.5

Entity-Relationship Model

The term entity-relationship (E-R) model is used to convey semantic infor- mation about the interrelationships among data elements, as part of the defini- tion of the containing information structure. Here we depart from formalisms of hierarchies, which can, in the simplest form, describe relationships only in terms of ownership or inclusion, or relations that might describe only the attributes grouped together in a tuple. The E-R model will allow us to describe the rela- tionships in semantic terms, i.e., tell us what these relationships mean (Chen, 1976; Fidel, 1987; Korth and Silberschatz, 1986, pp. 21–44).

The model is portrayed graphically and uses four basic symbols, shown in Fig. 5.10. The rectangle represents an entity; the lozenge, an attribute of an entity;

and the diamond, a relationship among entities. The line or arrow represents a link- age between an attribute and an entity or between and entity and a relationship.

Figure 5.11 shows a record in E-R form that might be used for university purposes. It shows student data, course data, and instructor data. This uses a sim- plified version of the notation, which could be expanded to show such infor- mation as whether a relation is one-to-many or many-to-one. Many students enroll in a course. Only one instructor teaches a given course (also simplified to omit multiple sections of a course).

A course entity (not shown) would have the attributes course title, course number, number of credits, meeting time, and classroom. The instructor entity would have, at least, the attributes name, department, and address. The number of attributes for all entities is simplified, or the page would be quite full.

A student is identified by nameand number, and lives ataddress. A stu- dent is enrolled in or has taken a coursewhich is identified by course number, title, and instructor. Course as well as classroom are attributes and indicate a relationship to another attribute. Each record of this database shows an instance of a student, enrolled in a course, which is scheduled for a classroom.

A relationship set is formed, for example, by combining all instances of a student’s number being enrolled in a course. Another is all instances of a course being scheduled in a classroom at a meeting time. Note that unless there is some degree of standardization in naming and defining relationships, the model may not convey a great deal of information.

120

5 Models of Virtual Data Structure

Ch005.qxd 11/20/2006 9:55 AM Page 120

Although this is not necessarily an easy diagram to read, it is possible to trace a connection between a student’s name and the name of each of the stu- dent’s instructors. It is also possible to tell whether one can find out whether any student lives in the same city as an instructor. The diagram does not answer the questions about who lives in the same city, but it does indicate that there are data relationships and connections that will yield this information. Whether the soft- ware is capable of using them is another question, of course.

The purposes of constructing an entity-relation model are to help plan a database and to enable users to understand what can be done with it.

5.6 Summary

End users of IR systems often have little patience for dealing with such esoteric concepts as the structure of a record. Some IR systems encourage this attitude by essentially treating a record as simply a set of words, not distinguished

5.6 Summary

121

ENTITY

ATTRIBUTE of an entity

RELATIONSHIP between entities

LINKAGE between attributes or entities

Figure 5.10

Symbols used in an entity-relation map: the rectangle represents an entity, the oval an attrib- ute of an entity, and the diamond a relationship between entities. A line or arrow repre- sents a linkage between an attribute and an entity or between an entity and a relationship.

as a field or attribute. As databases get larger, as is happening on the Web, and as more people use and learn to understand IR systems, we can expect a return to the use of record structure in searching, for those who have abandoned it or never learned it.

122

5 Models of Virtual Data Structure

Student

Is identified as

number name

Has taken courses

Courses

title

number

Instructor

grade

name

department Previously

attended

school city Lives at

Address

street city

state zipcode Identification

Secondary school

date taken

Figure 5.11

An entity-relation map: this maps shows two principal entities, Studentand Courses, and the various entities and attributes linked to them. Course list and Instructor are enti- ties in themselves, but are subordinated to Studentand Courses, respectively.

Ch005.qxd 11/20/2006 9:55 AM Page 122

123

6

Dalam dokumen Text Information Retrieval Systems (Halaman 137-142)