The Association for Computing Machinery's Digital Library

The Association for Computing Machinery (ACM) is a professional society that publishes seventeen research journals in computer science. In addition, its thirty eight special interest groups run a wide variety of conferences, many of which publish proceedings. The ACM members are practicing computer scientists, including many of the people who built the Internet and the web. These members were some of the first people to become accustomed to communicating online and they expected their society to be a leader in the movement to online journal publication.

Traditionally, the definitive version of a journal has been a printed volume. In 1993, the ACM decided that its future production process would use a computer system that creates a database of journal articles, conference proceedings, magazines and newsletters, all marked up in SGML. Subsequently, ACM also decided to convert large numbers of its older journals to build a digital library covering its publications from 1985.

One use of the SGML files is a source for printed publications. However, the plan was much more progressive. The ACM planned for the day when members would retrieve articles directly from the online database, sometimes reading them on the screen of a computer, sometimes downloading them to a local printer. Libraries would be able to license parts of the database or take out a general subscription for their patrons.

The collection came online during 1997. It uses a web interface that offers readers the opportunity to browse through the contents pages of the journals, and to search by author and keyword. When an article has been identified, a subscriber can read the full text of the article. Other readers pay a fee for access to the full text, but can read abstracts without payment.

Business issues

ACM was faced with the dilemma of paying for this digital library. The society needed revenue to cover the substantial costs, but did not want to restrain authors and readers unduly. The business arrangements fall into two parts, the relationships with authors and with readers. Initially, both are experimental.

In 1994, ACM published an interim copyright policy, which describes the relationship between the society as publisher and the authors. It attempts to balances the interest of the authors against the needs for the association to generate revenue from its publications. It was a sign of the times, that ACM first published the new policy on its web server. One of the key features of this policy is the explicit acknowledgment that many of the journal articles are first distributed via the web.

To generate revenue, ACM charges for access to the full text of articles. Members of the ACM can subscribe to journals or libraries can subscribe on behalf of their users.

The electronic versions of journals are priced about 20 percent below the prices of printed versions. Alternatively, individuals can pay for single articles. The price structure aims to encourage subscribers to sign up for the full set of publications, not just individual journals.

Electronic journals

The term electronic journal is commonly used to describe a publication that maintains many of the characteristics of printed journals, but is produced and distributed online. Rather confusingly, the same term is used for a journal that is purely digital, in that it exists only online, and for the digital version of a journal that is primarily a print publication, e.g., the ACM journals described in Panel 3.4.

Many established publishers have introduced a small number of purely online periodicals and there have been numerous efforts by other groups. Some of these online periodical set out to mimic the processes and procedures of traditional journals.

Perhaps the most ambitions publication using this approach was the On-line Journal of Current Clinical Trials which was developed by the American Association for the Advancement of Science (AAAS) in conjunction with OCLC. Unlike other publications for which the electronic publication was secondary to a printed publication, this new journal was planned as a high-quality, refereed journal for which the definitive version would be the electronic version. Since the publisher had complete control over the journal, it was possible to design it and store it in a form that was tailored to electronic delivery and display, but the journal was never accepted by researchers or physicians. It came out in 1992 and never achieved its goals, because it failed to attract the numbers of good papers that were planned for. Such is the fate of many pioneers.

More recent electronic periodicals retain some characteristics of traditional journals but experiment with formats or services that take advantage of online publishing. In 1995, we created D-Lib Magazine at CNRI as an online magazine with articles and news about digital libraries research and implementation. The design of D-Lib Magazine illustrates a combination of ideas drawn from conventional publishing and the Internet community. Conventional journals appear in issues, each containing several articles. Some electronic journals publish each article as soon as it has is ready, but D-Lib Magazine publishes a monthly issue with strong emphasis on punctuality and rapid publication. The graphic design is deliberately flexible, while the constraints of print force strict design standards, but online publications can allow authors to be creative in their use of technology.

Research libraries and conversion projects

One of the fundamental tasks of research libraries is to save today's materials to be the long-term memory for tomorrow. The great libraries have wonderful collections that form the raw material of history and of the humanities. These collections consist primarily of printed material or physical artifacts. The development of digital libraries has created great enthusiasm for converting some of these collections to digital forms.

Older materials are often in poor physical condition. Making a digital copy preserves the content and provides the library with a version that it can make available to the whole world. This section looks at two of these major conversion efforts.

Many of these digital library projects convert existing paper documents into bit- mapped images. Printed documents are scanned one page at a time. The scanned images are essentially a picture of the page. The page is covered by an imaginary grid.

In early experiments this was often at 300 dots per inch, and the page was recorded as an array of black and white dots. More recently, higher resolutions and full color scanning have become common. Bit-mapped images of this kind are crisp enough that they can be displayed on large computer screens or printed on paper with good

legibility. Since this process generates a huge number of dots for each page, various methods are used to compress the images which reduces the number of bits to be stored and the size of files to be transmitted across the network, but even the simplest images are 50,000 bytes per page.

Panel 3.5. American Memory and the National Digital Library Program

Background

The Library of Congress, which is the world's biggest library, has magnificent special collections of unique or unpublished materials. Among the Library's treasures are the papers of twenty three presidents. Rare books, pamphlets, and papers provide valuable material for the study of historical events, periods, and movements. Millions of photographs, prints, maps, musical scores, sound recordings, and moving images in various formats reflect trends and represent people and places. Until recently, anybody who wanted to use these materials had to come to the library buildings on Capitol Hill in Washington, DC.

American Memory was a pilot program that, from 1989 to 1994, reproduced selected collections for national dissemination in computerized form. Collections were selected for their value for the study of American history and culture, and to explore the problems of working with materials of various types, such as prints, negatives, early motion pictures, recorded sound, and textual documents. Initially, American Memory used a combination of digitized representations on CD-ROM and analog forms on videodisk, but, in June 1994, three collections of photographs were made available on the web.

The National Digital Library Program (NDLP) builds on the success of American Memory. Its objective is to convert millions of items to digital form and make them available over the Internet. The focus is on Americana, materials that are important in American history. Typical themes are Walt Whitman's notebooks, or documents from the Continental Congress and the Constitutional Convention.

Some of the collections that are being converted are coherent archives, such as the papers or the photograph collection of a particular individual or organization. Some are collections of items in a special original form, such as daguerreotypes or paper prints of early films. Others are thematic compilations by curators or scholars, either from within an archival collection or selected across the library's resources.

Access to the collections

American Memory discovered the enthusiasm of school teachers for access to these primary source materials and the new program emphasizes education as its primary, but certainly not its only audience. The most comprehensive method of access to the collections is by searching an index. However, many of the archival collections being converted do not have catalog records for each individual item. The collections have finding aids. These are structured documents that describe the collection, and groups of items within it, without providing a description for each individual item.

Hence, access to material in American Memory is a combination of methods, including searching bibliographic records for individual items where such records exist, browsing subject terms, searching full text, and, in the future, searching the finding aids.

Technical considerations

This is an important project technically, because of its scale and visibility. Conscious of the long-term problems of maintaining large collections, the library has placed great emphasis on how it organizes the items within its collections.

The program takes seriously the process of converting these older materials to digital formats, selecting the most appropriate format to represent the content, and putting great emphasis on quality control. Textual material is usually converted twice: to a scanned page image, and an SGML marked-up text. Several images are made from each photograph, ranging from a small thumbnail to a high-resolution image for archival purposes.

Many of the materials selected for conversion are free from copyright or other restrictions on distribution, but others have restrictions. In addition to copyright, other reasons for restrictions include conditions required by donors of the original materials to the library. Characteristic of older materials, especially unpublished items, is that it is frequently impossible to discover all the restrictions that might conceivably apply, and prohibitively expensive to make an exhaustive search for every single item.

Therefore the library's legal staff has to develop policies and procedures that balance the value to the nation of making materials available, against the risk of inadvertently infringing some right.

Outreach

A final, but important, aspect of American Memory is that people look to the Library of Congress for leadership. The expertise that the library has developed and its partnerships with leading technical groups permit it to help the entire library community move forward. Thus the library is an active member of several collaborations, has overseen an important grant program, and is becoming a sponsor of digital library research.

Since the driving force in electronic libraries and information services has come from the scientific and technical fields, quite basic needs of other disciplines, such as character sets beyond English, have often been ignored. The humanities have been in danger of being left behind, but a new generation of humanities scholars is embracing computing. Fortunately, they have friends. Panel 3.6 describes JSTOR, a project of the Andrew W. Mellon Foundation which is both saving costs for libraries and bringing important journal literature to a wider audience than would ever be possible without digital libraries.

Panel 3.6. JSTOR

JSTOR is a project that was initiated by the Andrew W. Mellon Foundation to provide academic libraries with back runs of important journals. It combines both academic and economic objectives. The academic objective is to build a reliable archive of important scholarly journals and provide widespread access to them. The economic objective is to save costs to libraries by eliminating the need for every library to store and preserve the same materials.

The JSTOR collections are developed by fields, such as economics, history, and philosophy. The first phase is expected to have about one hundred journals from some fifteen fields. For each journal, the collection consists of a complete run, usually from the first issue until about five years before the current date.

The economic and organizational model

In August 1995, JSTOR was established as an independent not-for-profit organization with a goal to become self-sustaining financially. It aims to do so by charging fees for access to the database to libraries around the world. These fees are set to be less than the comparable costs to the libraries of storing paper copies of the journals.

The organization has three offices. Its administrative, legal and financial activities are managed and coordinated from the main office in New York, as are relationships with publishers and libraries. In addition, staff in offices at the University of Michigan and Princeton University maintain two synchronized copies of the database, maintain and develop JSTOR's technical infrastructure, provide support to users, and oversee the conversion process from paper to computer formats. Actual scanning and keying services are provided by outside vendors. JSTOR has recently established a third database mirror site at the University of Manchester, which supplies access to higher education institutions in the United Kingdom.

JSTOR has straightforward licenses with publishers and with subscribing institutions.

By emphasizing back-runs, JSTOR strives not to compete with publishers, whose principal revenues come from current issues. Access to the collections has initially been provided only to academic libraries who subscribe to the entire collection. They pay a fee based on their size. In the best Internet tradition, the fee schedule and the license are available online for anybody to read.

The technical approach

At the heart of the JSTOR collection are scanned images of every page. These are scanned at a high resolution, 600 bits per inch, with particular emphasis on quality control. Unlike some other projects, only one version of each image is stored. Other versions, such as low-resolution thumbnails, are computed when required, but not stored. Optical character recognition with intensive proof reading is used to convert the text. This text is used only for indexing. In addition, a table of contents file is created for each article. This includes bibliographic citation information with keywords and abstracts if available.

These two examples, American Memory and JSTOR, are further examples of a theme that has run throughout this chapter. Libraries, by their nature, are conservative organizations. Collections and their catalogs are developed over decades or centuries.

New services are introduced cautiously, because they are expected to last for a long time. However, in technical fields, libraries have frequently been adventurous.

MARC, OCLC, the Linked Systems Project, Mercury, CORE, and the recent conversion projects may not have invented the technology they used, but the deployment as large-scale, practical systems was pioneering.

Chapter 2 discussed the community that has grown up around the Internet and the web. Many members of this community discovered online information very recently, and act as though digital libraries began in 1993 with the release of Mosaic. As discussed in this chapter, libraries and publishers developed many of the concepts that are the basis for digital libraries, years before the web. Combining the two communities of expertise provides a powerful basis for digital libraries in the future.

Chapter 4 Innovation and research

Dalam dokumen Digital Libraries (Halaman 45-50)