Chapter 9. Using Facets to Classify and Access Digital Resources: Proposal
9.2. Examining existing classification structures
9.2.3. Results and discussion
Each structure was examined from three essential dimensions: structure, logic, and semantics. Here, we present a summary of results and related observations; a complete description of results and interpretations for each stage of the analysis will be found in [HUD 05] and [HUD 06]. The following abbreviations have been used when referring to the six structures that were examined: ERD (Educator’s Reference Desk), EdNA (Australian Education Network), INT (INTUTE Education), GEM (Gateway to Educational Materials), EI (Education Index) and EVL (Education Virtual Library).
9.2.3.1. Structure
The structural analysis generated sets of quantitative data relating to maximum, minimum and average numbers of classes and hierarchical levels, as well as data on the branching factor or average number of classes at each level.
The data revealed little more than what we already knew, or at least suspected.
The average of 7.83 main classes in all structures is well below the minimum of 10 top classes judged efficient for organizing resources in a specialized field and, not surprisingly, the higher numbers of distinct main classes are found in the deeper and/or most complex structures (ERD, EdNA). There is a clear distinction in the total number of distinct classes between the four hierarchical structures (ERD, EdNA, INT, GEM) and the two faceted structures (EI, EVL), the former being significantly more developed. The average number of hierarchical levels, at 3.33, corresponds to a de facto standard number of levels recommended and common for general web classification structures; it is generally assumed that the majority of information seekers will navigate only to the third or fourth level of division, each level corresponding to a mouse click, before reorienting their search. Faceted structures (EI and EVL) are less balanced than their hierarchical counterparts, starting with a narrow choice of top classes (two and four respectively), and then quickly expanding their semantic coverage by way of long alphabetical lists of sub-classes at the second level. None of the six structures is overly complex. This obvious lack of specificity will allow for no more than a broad classification of resources in the virtual collection.
Using Facets to Classify and Access Digital Resources 149 9.2.3.2. Logic
Qualitative data relating to the logic dimension of each classification structure was obtained through manual examination and interpretation; three sets of data are available.
The first set of data describes the dividing criteria that are applied at the first three levels of the structure. Six potential values were selected from a list of eight described by [ZIN 02]. They are: Subject, Object, Target audience, Format (or External form), Reference (or Internal form) and Location. A second set of data describes the nature of the relation linking classes at the top three levels of each classification structure. Potential values are: 1) the generic relationship, where the lower level class is a specific type of the object, event, etc. named at the higher level; 2) the partitive relationship, where the lower level class is a component of the object, event, etc. named at the higher level; 3) the instance relationship, where the lower level class is a particular object, event, etc. serving as an example of the object, event, etc. named at the higher level; 4) the contextual relationship, where higher and lower classes are found in the same environment but not in the same natural or logical taxonomy. The third set of data describes the internal arrangement of classes at the top three levels of the structure; three cases are observed: the alphabetical arrangement, the systematic arrangement, and the “mixed”
arrangement.
The practice of mixing various principles of division in a developing hierarchy is contrary to theoretical principles of classification because it creates classes of resources that are not mutually exclusive, thus “causing uncertainty for the browser when he has to select a category” [VAN 98, p. 382]. Nevertheless, such a mix is found at the top level of all structures in our sample. In ERD for example, Subject, Format, and Target audience are at the root of first level classes. However, whether classes are mutually exclusive or not may not be a problem, providing resources can be assigned to more than one class at the same hierarchical level. This possibility remains to be verified; we do not know at this time if structures and processes now allow for greater flexibility in classification than was possible in traditional, pre- Internet contexts.
Objects, events, etc. are most frequently linked through hierarchical relations of a contextual nature; this is also the case in bibliographic classifications such as the Dewey Decimal Classification (DDC) or the Universal Decimal Classification (UDC). This choice of a contextual (e.g. Educational management ŹEducational facilities) rather than a truly generic relationship (e.g. Educational institutions ŹSecondary schools) contributes to making the classification structure more hospitable and capable of integrating easily new classes and specific topics. The simple and familiar alphabetical display of classes is also beneficial to the
150 Digital Libraries
hospitality of the structure, and undoubtedly preferable to a more or less obscure systematic arrangement reflecting the designer’s personal view of the world.
The objectives of this project did not include comparing innovative with traditional structures on the basis of logic, but enough is known about the DDC and the UDC to suggest that our sample ad hoc structures are neither more difficult nor easier to navigate than traditional structures, which are generally considered complex and not user-friendly.
9.2.3.3. Semantics
The semantic analysis provided data on conceptual and terminological concordance of our sample classification structures with authoritative sources in the field of education. Results were obtained through a standard methodology for establishing compatibility, involving manual examination of data and the coder’s judgment as to degree of concordance. Possible values were: Full or Partial terminological concordance, Full or Partial conceptual concordance, and No concordance.
We compared ERD’s class denominations at the top three levels with those appearing in the table of contents of an authoritative reference tool, the Encyclopaedia of Educational Research, 6th edition. Terminological and conceptual concordance remains rather low, with the latter being slightly higher, as could be expected.
Top level class denominations in ERD were also compared to captions in the web version of the Dewey Decimal Classification (DDC). Surprisingly, we note that concordance between ERD and DDC is higher than concordance between ERD and the specialized reference work. This may be explained by the fact that the Dewey system is already used for classifying millions of documents and subjects; most likely, this contributes to making it close to being conceptually complete at the first five or six levels of hierarchy, even in specialized areas. The comparison with Dewey also benefits from the encyclopaedic character of its coverage; when a concept is only peripherally related to education, it will not be found in a specialized reference tool, but it is likely to be found somewhere in a general knowledge organization structure.
Our results reveal that partial concordance is always higher than full concordance, at both conceptual and terminological levels. A quick review of top level classes shows that a single facet, Educational levels, is present in all six structures, either in the form of a first-level inclusive class or as a list of constituting categories (Primary education, Secondary education, etc.). This is not surprising, given that there is not a single (or a best) way to segment and organize the world of
Using Facets to Classify and Access Digital Resources 151 concepts, even within the same cultural, political, disciplinary, etc. context. This particularity increases slightly the complexity of the structure, without affecting its authority.
9.2.3.4. Conclusion
This first part of our research project confirmed that the hierarchical model remains popular for organizing web resources in specialized virtual collections. In our sample, hierarchies were contextual rather than generic, not overly complex and not very specific. Choice, arrangement and sequence of classes within the structures appeared logical enough to make them easy to apprehend and navigate. However, we observed that the structures were not very flexible and did not appear to benefit much from the technological environment in which they had developed and were now applied.
Among the six structures that were examined, two appeared closer to an alternate model for organizing objects, subjects, and classes: Education Index (www.educationindex.com) and Education Virtual Library (www.csu.edu.au/education/library.html) made use of explicit facets. However, these two structures were the least developed and the least balanced of all, and it was not possible to extrapolate on the usefulness of facets to structure and access virtual collections. It is this alternative faceted model that we have explored in the second part of our research project.
9.3. A faceted structure to organize and access resources in a virtual library in education
The use of faceted structures to organize and access specialized digital resources is not yet widespread, even if it seems obvious that contemporary networks constitute the ideal environment for implementing the analytico-synthetic principles and practices suggested by S.R. Ranganathan in the 1930s. The facet is a characteristic, an indicator, a criterion that may be used to subdivide a class or a set of objects in homogenous subsets. Age, gender or place of residence, for example, are facets that can be used to create subsets of persons. The same facets can be used to identify fairly precisely each member of the original group: thus, X is a woman, belonging to the 30-39 age group, residing in the Montérégie administrative region of Quebec, etc. A facet may be usable with any group of objects or subjects (for example, agent, process, property, method), or apply only to certain categories of concepts and objects or within a single discipline (for example, educational level or source of financing, in education).
152 Digital Libraries
When a choice has been made to work with a faceted structure rather than with a strictly hierarchical structure, classifying an information resource no longer consists of locating its main subject on a pre-drawn thematic map; rather, it requires a complete analysis of the subject using in turn all facets or perspectives from which it can be considered. The subject can then be represented very precisely. Intellectually, the faceted classification offers several benefits:
1) Starting with a much smaller number of distinct classes, it authorizes a much more refined representation of many subjects than enumerative classifications (such as the DDC or the UDC) do.
2) It is more flexible and adapts easily to conceptual evolution and renewal; it is always possible to modify isolates or values attached to a facet, or even to add a facet, without affecting to a large extent the global structure of the system.
3) When explicit facets are used to organize and access subjects and collections, it becomes possible to optimize automated search strategies since a subject may then be retrieved using any one of the facets that has been used to describe it
.
Several knowledge organization specialists have shown a definite interest in facets in their discussion of organization and access to web resources [BRO 02, ELL 00, LAB 06]. Van der Walt [VAN 04] and Zins and Guttmann [ZIN 00] have designed faceted structures to describe and classify specific domains; we used these structures as examples in designing our own proposal for an alternative to a strictly hierarchical structuring of virtual collections in education.
Our structure was developed in several stages, using a deductive approach strongly dependent on literary warrant: 1) creation of a sample virtual collection of web resources in education; 2) classification of each resource using a traditional classification scheme (DDC), as well as the structure used by The Educator’s Reference Desk (ERD); 3) indexing of each resource using a traditional thesaurus:
EDUthès: Thésaurus de l’éducation (http://www.cdc.qc.ca/eduthes.html), and design of a bank of candidate descriptors and potential isolates; 4) identification of structural facets needed for content analysis and representation; 5) construction of the faceted structure.