Preservation
6.5 From Ontology to Architecture and Design
x the hierarchy and order of elements, and the number of each kind of child element;
x whether or not an element can be empty or can include text;
x the data types for elements and attributes; and x default and fixed values for elements and attributes.
Since XML is thoroughly documented in many books193 and WWW re- ports,194 including standards definitions, not much needs to be said about it in this book. Only a small portion of standard XML will be needed within digital objects to be preserved. There is little doubt that the rules for XML portions needed for preservation are documented in forms that will survive anticipated technological obsolescence.
data management, access, and planning. An OAIS archival holding corre- sponds conceptually to a Fig. 12 content object—a set of information that is the original target of preservation. It is comprised of one or more con- stituent bit-strings and secondary information related to these data objects’
representation.196 Its Fig. 15 information model separates long-term bit- string storage from content structure management. An OAIS Content In- formation object itself is encapsulated in an Information Package that holds and binds the Content Information object components.
OAIS distinguishes between what is preserved, an Archival Information Package (OAIS AIP), what is submitted to the archive, a Submission In- formation Package (OAIS SIP), and what is delivered to archive clients, a Dissemination Information Package (OAIS DIP). This distinction is needed to talk about the fact that some repository submissions have insuf- ficient information for meeting the objectives of that repository.
Critical discussions suggest reasons for making DIP bit patterns identi- cal to those of corresponding AIPs, e.g., Beedham 2005. A counter- example might be the U.S. NARA collections, in which the number of small objects is so large and the organization into logical file folders is so compelling that each ingestion (SIP) and repository holding (AIP) might be such a folder containing many hundred individual memoranda or com- pleted forms. If so, information requests and deliveries (DIPs) would con- veniently permit specification of a range of objects from one or more AIPs.
6.5.1 From the OAIS Reference Model to Architecture
To understand the distinction between a reference model and an architec- ture,197 consider the Fig. 16 OAIS ingest processes198 and a similarly struc- tured fragment of a reference model for residences, suggested by the Fig.
17 processes of a Kitchen structure. This fragment suggests how we might map much of the OAIS model onto our residence model. Each OAIS proc- ess would correspond to a room or other space in a residence. The word- ing of the following paragraphs closely mimicsOAISparagraphs.
A residence may contain one or more areas called Kitchens. The Re- ceive Groceries process provides storage space and an entrance to receive
196 Bekaert 2005, A Standards-based Solution for the Accurate Transfer of Digital Assets.
Giaretta 2005 Supporting e-Research Using Representation Information, http://
eprints.erpanet.org/archive/00000100/.
197 Shortcomings of [OAIS], http://www.ieee-tcdl.org/Bulletin/v2n2/egger/egger.html, reminds readers that the 2002 version of OAIS emphasizes, “This reference model does not specify a design or implementation.”
198 This subsection simply rewrites some of Reference Model for an Open Archival Information System (OAIS) 2001, §4.1.1.2.
a grocery shipment. Its execution represents a legal transfer of ownership of the groceries, and may require that special controls be placed on the shipments. It provides the Grocer a receipt, which might accompany a re- quest to send missing items.
Fig. 16: OAIS ingest process199
The Quality Assurance process validates correct receipt in the unpack- ing area. This might include tasting a sample of each received item, and the use of a log to record and identify any shortfalls.
Fig. 17: Kitchen process in a residence
The Prepare Meal process transforms one or more packages into one or more dishes that conform to culinary and health standards. This may in- volve boiling, frying, baking, or blending of contents of grocery ship- ments. The Cooking process may issue recipe requests to a cookbook to
199 Adapted from CCSDS 650.0-R-2, Reference Model for an Open Archival Information System, Fig. 4-2, http://www.ccsds.org/documents/pdf/CCSDS-650.0-R-2.pdf.
obtain descriptions needed to produce the menu. This process sends sam- ple dishes for approval to a critic, and receives back an appraisal.
Likewise, the Generate Menu and other processes have their own rules.
This reference model helps toward building a residence by providing builders and prospective residents a shared vocabulary. Each builder fur- ther needs instructions about what kind of residence to construct: a single family home, an apartment building, a military barracks, or a college resi- dence. Just as our reference model says what it means to be a place to live—a residence,OAIS articulates what it means to be a place to hold in- formation—a library or archive. Each is in the form of an intension. Like most definitions, each is incomplete.
The kitchen model above does not specify architecture. A builder’s in- structions should include dimensions, location, and other factors. Such de- tail would not appear in our reference model, just as OAIS does not distin- guish between a research library, a state government archive, a corporate archive, or a personal collection. Missing in each case is a high-level de- sign differentiating structural alternatives; quantifying spaces, resources, and flows; describing materials and visible appearances; specifying utili- ties and safety factors, and so on.
How much qualitative and quantitative detail must an architecture ex- press? The customer decides. He will often accept conventional levels and styles of description, but might also want to inject his own notions about what is important. A satisfactory architecture would describe every aspect on which the customer insists, and these would be essential ele- ments of a prudent construction contract.
6.5.2 Languages for Describing Structure
Discussions of structure occur at various levels and with different styles:
x For syntactics—formal structure, language, logic, data, records, deduc- tions, software, and so on.
x About semantics—meanings, propositions, validity, truth, signification, denotations, and so on.
x In pragmatics—intentions, communications, negotiations, and so on.
x Relative to social worlds—beliefs, expectations, commitments, con- tracts, law, culture, and so on.
In a knowledge theory that includes relations as primitive constructors, graphs can be considered a derivative notion.200 There are many graphical languages for expressing roughly the same information. They include dif-
200 Sowa 2000, Knowledge representation.
ferences of aspects shown or suppressed in order to make their depictions comprehensible at a glance. Directed graphs with labeled nodes and arcs constitute graphical languages for depicting ternary relations
Schema are models, and may themselves require further models that ex- plain by reminding readers how words are used. The explications of schema are called reference models.
Semantic intentions can be conveyed by a knowledge management lan- guage that is used for expressions that accompany the information. Ter- nary relations and information identifiers in fact comprise a sufficient knowledge management language. In particular, they are sufficient for an elegant representation of any information collection.
6.5.3 Semantic Interoperability
A model is created for a specific purpose. It is a simplified representation of part of the world. This simplification should help us analyze the under- lying reality and understand that. Many groups are working to map on- tologies, subject classifications, and thesauri to each other. While signifi- cant progress has been achieved in system, syntactic, and structural/schematic interoperability, comprehensive solutions to semantic interoperability remain elusive.201 Yet, trends in software technologies continue to bring focus on semantic issues.202 For instance, in late 2005 a W3C working group was created to define a business rules language for interoperability—a rules interchange language.203
What is being attempted is scientific observation to relate subjective opinions to objective assertions about social behavior (§3.3). However, it would be unrealistic to expect comprehensive convergence to schema that fully satisfy all the members of any interest group. Recent literature re- lated to the failure of Artificial Intelligence to achieve its early “pie in the sky” objectives is instructive.204
Models with graphic languages such as UML205 and OWL206 have ad- vantages over XML markup. They are more readable and convey seman- tic intensions. One glance at a model can convey a rough idea of the num-
201 In the Semantic Web literature, “the Holy Grail of semantic interoperability remains elusive.”
202 Ouksel 1999, Semantic Interoperability in Global Information Systems.
203 Rule Interchange Format, http://www.w3.org/2005/rules/.
204 Lemieux.2001, Let the Ghosts Speak: An Empirical Exploration of the “Nature” of the Record,
“a case study of record-keeping practices illustrates … many valid conceptualizations arising from particular social contexts.”
205 Unified Modeling Language, http://www.omg.org/technology/documents/formal/uml.htm.
206 OWL Web Ontology Language, http://www.w3.org/TR/owl-features/
ber of object classes under discussion and the complexity of their relation- ships.