• Tidak ada hasil yang ditemukan

Search the Inverted File or Thesaurus

Dalam dokumen Text Information Retrieval Systems (Halaman 183-186)

Querying the Information Retrieval System

7.4.3 Search the Inverted File or Thesaurus

We will discuss this function as it applies to traditional, pay as you go, IRS, to local CD-ROM-based IRS, to locally subsidized IRS, to remotely subsidized IRS, and to Web search engines.

1. Search the inverted file or thesaurus within the traditional IRS—This function initiates a search in which the user browses through values of attributes, not complete records. For example, the user might be uncertain of the spelling of a person’s name and want to scan a list of names beginning with a known string

164

7 Querying the Information Retrieval System

Ch007.qxd 11/20/2006 10:07 AM Page 164

of letters. Then, based on this reconnaissance, a choice will be made, and a set created, using the selected value or values of the name attribute. Searchers may request such a search to find out what terms are there, what similar terms might be found (i.e., similarly spelled terms), and in how many records a specific value of an attribute occurs. Obviously, the entire inverted file cannot be displayed at once. The common practice is to display a window on the file, of anywhere from about six to twenty terms, the system allowing the user to step through the file, one page at a time. Oddly, having a command for backward paging is less common than for forward paging.

A thesaurus is stored as a structure similar to that of an inverted file, as a series of entries for terms, each entry showing other terms to which the entry is related and the nature of the relationship. Searching it is much like searching the inverted file. There is not necessarily a thesaurus for every database. If there is one, it is searched in the same manner as an inverted file, but the display includes references to related terms. Then, if the user asks, the list of related terms with the nature of their relationships is displayed. Either all related terms or a page of thesaurus data is displayed. The user may then go on to another base word, or follow the chain of relationships. The sequence might be to ask for the inverted file display for PHYSICS, see that there are n related terms for PHYSICS, and then ask to see them. This might be followed by asking for the display for the subor- dinate term BIOPHYSICS, then one of its related terms, and so on. Figures 7.3–7.5 show some typical results from searches of the ERIC educational research online thesaurus. In Fig. 7.3, the searcher, interested in use of a computer to assist in test administration, first asks simply for a thesaurus search on the single word

TESTING. It shows a large number of hits (truncated here at 12 lines) which

7.4 Functions Performed

165

Command : EXPAND TESTING

Ref Items RT Index-term

E1 1 TESTIMONY CONGRESS 97TH

E2 1 TESTINESS

E3 48983 68 *TESTING (GATHERING AND PROCESSING INFORMATION ABOUT I. . .)

E4 99 TESTING ACCOMMODATIONS

E5 186 TESTING ACCOMMODATIONS (DISABILITIES)

E6 3 TESTING ACCOMMODATIONS (LIMITED ENGL PROFICIENCY. . .

E7 1 TESTING ALTERNATIVES

E8 2 TESTING APPARATUS

E9 7 TESTING CENTERS

E10 27 TESTING CONDITIONS

E11 1 TESTING CONTEXT

E12 1 TESTING CORRELATION

Figures 7.3

Thesaurus searching in ERIC, through Dialog -1. Terms in the inverted file are shown in alphabetical order, starting with two terms preceding the search term TESTING. In col- umn 1 are simply the line numbers; then the number of documents in the database con- taining the term at the right; then the number of related terms to the one at right; and finally the term itself. This does not show all the lines in the actual search. The *on line

E3 identified the user’s search term.

would encourage the searcher to ask about terms related to TESTING, the result of which is shown in Fig. 7.4. There, the user sees one item of particular inter- est, ITEM R7, COMPUTER ASSISTED TESTINGand asks to see terms related to it. In Fig. 7.5 we see the result of this request, showing four possibly useful terms. At this point a user might decide that COMPUTER ASSISTED TESTINGis likely to yield a good result as a search term and go on to a SELECTcommand using this term.

2. Search the inverted file or thesaurus within the local CD-ROM IRS—Most of such packages will include access to the inverted file and the thesaurus, if it is present, in the same manner as the traditional IRS.

3. Search the inverted file or thesaurus within the subsidized local access IRS—The suppliers of these systems normally will provide access to a thesaurus if it is avail- able and will allow access, sometimes by pull down menu, to certain portions of the inverted file, usually author name, and journal title, but sometimes assigned key words, descriptors, and other fields.

166

7 Querying the Information Retrieval System

Command: EXPAND E3

Ref Items Type RT Index-term

R1 48983 68 * TESTING (GATHERING AND PROCESSING INFORMATION ABOUT I. . . R2 0 U 1 TEST ADMINISTRATION

R3 0 U 1 TESTING METHODS R4 0 U 1 TESTING TECHNIQUES

R5 641 N 15 ADAPTIVE TESTING (TESTING THAT INVOLVES SELECTING . . . R6 1400 N 9 COMPARATIVE TESTING (TESTING I WHICH TWO OR MORE INDIV...

R7 1975 N 11 COMPUTER ASSISTED TESTING (USE OF COMPUTERS IN TEST . . . R8 140 N 11 CONFIDENCE TESTING (TESTING TECHNIQUE THAT DETERMINES. . . R9 3068 N 12 EDUCATIONAL TESTING (USE OF TESTS TO ASSESS THE EFFECT . . . R10 392 N 6 GROUP TESTING (PROCESS OF ADMINISTERING TESTS TO GROUPS) R11 317 N 4 INDIVIDUAL TESTING (PROCESS OF ADMINISTERING TESTS TO INDI R12 1763 N 12 MINIMUM COMPETENCY TESTING (MEASUREMENT OF THE ATTAIN

Figures 7.4

Thesaurus searching in ERIC-2. The user has asked to see information about terms related to TESTING(by typing EXPAND E3). “Related” means formally shown as related in the thesaurus. The display shows related terms in alphabetical order and the nature of the relationship. The *simply reminds the user what term had been entered.

Command: EXPAND R7

Ref Items Type RT Index-term

R1 1975 11 * COMPUTER ASSISTED TESTING (USE OF COMPUTERS IN TEST . . . R2 0 U COMPUTERIZED ADAPTIVE TESTING #

R3 0 U COMPUTERIZED TAILORED TESTING #

R4 13557 B 13 COMPUTER USES IN EDUCATION (THE USE OF COMPUTERS . . . R5 48983 B 68 TESTING (GATHERING AND PROCESSING INFORMATION ABOUT . . . R6 641 R 15 ADAPTIVE TESTING (TESTING THAT INVOLVES SELECTING TEST . . R7 2347 R 19 COMPUTER SCIENCE (STUDY OF THE THEORY, DESIGN, ANALYSIS...

Figures 7.5

Thesaurus searching in ERIC-3. The user has selected one related term, COMPUTER ASSISTED TESTING, and sees the terms related to it. The *again reminds the user what term had been entered.

Ch007.qxd 11/20/2006 10:07 AM Page 166

4. Search the inverted file or thesaurus within the subsidized remote access IRS

While these systems vary, some provide powerful assistance. Entrez, the Life Sciences search engine that powers PubMed, also provides full access to MeSH, the National Library of Medicine’s thesaurus as a separate database. This allows the collection of subject headings prior to entering a PubMed search. The Preview/Index tab provides an alphabetical display of all search terms in each PubMed search field. You can browse by all fields or within a specific field by choosing from a pull down menu.

5. Search the inverted file or thesaurus with a Web search engine—Most Web search engines do not make a thesaurus available, nor will an inverted file view be available for user searching. However, Yahoo and some other providers maintain directories of broad subject categories divided into more specific classes that may be searched in an hierarchical manner from their sites.

Dalam dokumen Text Information Retrieval Systems (Halaman 183-186)