Dissemination of Controlled Vocabularies - Attribute Content and Values

Attribute Content and Values

4.7.2 Dissemination of Controlled Vocabularies

Controlled vocabularies are usually created as hierarchic structures, although they may vary considerably in the number of levels of depth of the hierarchy. The typical relationships used are those described in Section 4.3.3.

There can be other relationships and certainly other names for those illustrated there. It is necessary to communicate the changes to the users and to the indexers. It is necessary to have the relationships among terms defined and explained and disseminated among the potential user group as well as possible. This implies the availability of the controlled vocabulary in its most current form to both searchers and indexers.

Until relatively recently, it was not common for libraries to make the book, Library of Congress Subject Headings available to patrons. Generations of users may have been told in school or in library tours that “you can look it up by subject,” but were never told what subjects they could look “it” up under or how to find these subjects. This book of subject headings is a thesaurus for a controlled language, although the “language” consists not of individual words but largely of short phrases. If users do not know what is in it, they must guess, and the poor retrieval outcome that is likely to follow on guesswork is caused nei- ther by the cataloguers nor by the retrieval system, but by lack of communica- tion among the community of users.

4.8

Importance of Point of View

In Section 4.5 we discussed several aspects of the ambiguity of attribute values based upon the characteristics of natural language, and illustrated by some of the vagaries of English. These quirks, caused by the relationships of signs to one another in a linguistic context, are on the level of meaning analysis referred to by Morris (1946, p. 3) as syntactic meaning, which goes beyond the basic semantic meaning, the relationship of signs to their significates. Morris suggests, however, that there is a third level that deserves mention, pragmatic meaning. This is the relation of signs to situations and behaviors in a sociological or psychological context.

Pragmatic meaning distinctions often occur in natural language. We have previously (Boyce et al., 1994, p. 101) used the example that shows the phrase

“Baton Rouge” stated at an airline ticket counter in Toronto could mean that a passenger wanted a ticket to go to Baton Rouge, while the same phrase stated to a flight attendant who is announcing that the plane is about to land could mean the passenger wants to know if the plane is landing in Baton Rouge. It is

100

4 Attribute Content and Values

Ch004.qxd 11/20/2006 9:54 AM Page 100

a good example because the basic context is airline travel in both examples, there is no syntactical difference in the expression, the sign and the significant are the same in both examples, and yet within that context the viewpoint of both the speaker and the listener change with the sociological situation. The fact that the plane is about to land, possibly at an intermediate stop, does not change its ultimate destination, nor does the ultimate destination identify which intermediate stop is at hand.

This would indicate that we might well need to consider such ambiguity in any study of the meaning of natural language, and that we might wish to control for it, should we be designing a controlled vocabulary. Since a search on BATON ROUGE would find items where the phrase was used as a potential destination, and where it was used as a current location, those who wanted only to retrieve the current location usage might find a large number of false drops.

It is possible to attempt to control for this problem by the use of the pre- cision device known as the role indicator. This requires that the designers of a vocabulary decide prior to its publication what potential points of view might cause ambiguity in searching, list these points of view, and assign a code to each of them. The indexer is then required to append such a code to each term assigned, and it may then be used by the searcher to limit the pragmatic meaning of the term (e.g., baton rouge as a destination for ticketing or as the name of the next destination of a particular flight, as the capital of a state, as a refuge for hurricane victims). The assumption is that there is a set of points of view that can be identified that will apply to all the terms of the vocabulary, and that indexers can apply them consistently. The role is related to the idea of a facet in classifica- tion theory, and to some extent to the idea of a scope note, which is used in a controlled vocabulary to distinguish homographs. A scope note is unique to the term to which it is assigned, whereas a role may be applicable to any vocabulary term.

In a sense roles may be considered a restricted set of subheadings that may be applied to any term.

The first use of roles appears to have been in the telegraphic abstracts discussed in Section 1.5.2. There seems to be little doubt that roles can improve the percent of useful records and reduce false drops. Unfortunately, they may also reduce the number of relevant records retrieved, add complexity to both the indexing and searching process, and have an adverse effect on indexer consis- tency. As Lancaster (1968, p. 233) has said, “A cost-effectiveness analysis may well reveal that it is more economical not to use role indicators, thereby saving indexing and searching time, to allow some incorrect term relations to occur, and to eliminate the irrelevant citations thus retrieved through a post search screening operation . . . .” Certainly, the device is little used today.

While the difficulties of pragmatic meaning are real, their effect on retrieval may not be terribly great. They are one more element to be considered in text-based retrieval. This is not to say that there are no other sociological or psychological factors that affect how the user of an IR system judges the results of a search. These factors are not topical, however, and are not related to the

4.8 Importance of Point of View

101

interpretation of a term’s meaning, but rather to factors affecting relevance judgements (Barry, 1994). We will discuss these in Section16.2.

4.9 Summary

The essence of this chapter has been that attributes in databases are repre- sented by symbols whose values must be generally understood among the users of the databases. In particular, the authors or composers of records and of codes and of controlled languages must consider the users. In some cases, such as sound and graphic records, the encoding of an attribute may be so complex that sim- plifying transformations are necessary.

102

4 Attribute Content and Values

Ch004.qxd 11/20/2006 9:54 AM Page 102

103

5

Dalam dokumen Text Information Retrieval Systems (Halaman 119-122)