IGI Global Selected Readings On Database Technologies And Applications Aug 2008 ISBN 1605660981 pdf

(1)

(2)

Selected Readings on

Database Technologies

and Applications

Terry Halpin

Neumont University, USA

Hershey • New York

(3)

Director of Editorial Content: Kristin Klinger Managing Development Editor: Kristin M. Roth Senior Managing Editor: Jennifer Neidig Managing Editor: Jamie Snavely Assistant Managing Editor: Carole Coulson Typesetter: Carole Coulson Cover Design: Lisa Tosheff Printed at: Yurchak Printing Inc. Published in the United States of America by

Information Science Reference (an imprint of IGI Global) 701 E. Chocolate Avenue, Suite 200

Hershey PA 17033 Tel: 717-533-8845 Fax: 717-533-8661 E-mail: cust@igi-global.com Web site: http://www.igi-global.com and in the United Kingdom by

Information Science Reference (an imprint of IGI Global) 3 Henrietta Street

Covent Garden London WC2E 8LU Tel: 44 20 7240 0856 Fax: 44 20 7379 0609

Web site: http://www.eurospanbookstore.com

Copyright © 2009 by IGI Global. All rights reserved. No part of this publication may be reproduced, stored or distributed in any form or by any means, electronic or mechanical, including photocopying, without written permission from the publisher.

Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or companies does

not indicate a claim of ownership by IGI Global of the trademark or registered trademark.

Library of Congress Cataloging-in-Publication Data

Selected readings on database technologies and applications / Terry Halpin, editor. p. cm.

Summary: "This book offers research articles focused on key issues concerning the development, design, and analysis of databases"--Provided by publisher.

Includes bibliographical references and index.

ISBN 978-1-60566-098-1 (hbk.) -- ISBN 978-1-60566-099-8 (ebook) 1. Databases. 2. Database design. I. Halpin, T. A.

QA76.9.D32S45 2009 005.74--dc22

2008020494 British Cataloguing in Publication Data

A Cataloguing in Publication record for this book is available from the British Library.

All work contributed to this book set is original material. The views expressed in this book are those of the authors, but not necessarily of the publisher.

(4)

Detailed Table of Contents

Prologue ... xviii

About the Editor ... xxvii

Section I

Fundamental Concepts and Theories

Chapter I

Conceptual Modeling Solutions for the Data Warehouse ... 1

Stefano Rizzi, DEIS - University of Bologna, Italy

This opening chapter provides an overview of the fundamental role that conceptual modeling plays in data

warehouse design. Specifically, research focuses on a conceptual model called the DFM (Dimensional Fact Model), which suits the variety of modeling situations that may be encountered in real projects

of small to large complexity. The aim of the chapter is to propose a comprehensive set of solutions for

conceptual modeling according to the DFM and to give the designer a practical guide for applying them

in the context of a design methodology. Other issues discussed include descriptive and cross-dimension

attributes; convergences; shared, incomplete, recursive, and dynamic hierarchies; multiple and optional arcs; and additivity.

Chapter II

Databases Modeling of Engineering Information ... 21

Z. M. Ma, Northeastern University, China

As information systems have become the nerve center of current computer-based engineering, the need for engineering information modeling has become imminent. Databases are designed to support data storage, processing, and retrieval activities related to data management, and database systems are the key to implementing engineering information modeling. It should be noted that, however, the current mainstream databases are mainly used for business applications. Some new engineering requirements challenge today’s database technologies and promote their evolution. Database modeling can be clas

(10)

Chapter III

An Overview of Learning Object Repositories ... 44 Argiris Tzikopoulos, Agricultural University of Athens, Greece

Nikos Manouselis, Agricultural University of Athens, Greece Riina Vuorikari, European Schoolnet, Belgium

Learning objects are systematically organized and classified in online databases, which are termed learn -ing object repositories (LORs). Currently, a rich variety of LORs is operat-ing online, offer-ing access to wide collections of learning objects. These LORs cover various educational levels and topics, store learning objects and/or their associated metadata descriptions, and offer a range of services that may vary from advanced search and retrieval of learning objects to intellectual property rights (IPR) manage-ment. Until now, there has not been a comprehensive study of existing LORs that will give an outline of their overall characteristics. For this purpose, this chapter presents the initial results from a survey of 59 well-known repositories with learning resources. The most important characteristics of surveyed LORs are examined and useful conclusions about their current status of development are made.

Chapter IV

Discovering Quality Knowledge from Relational Databases ... 65 M. Mehdi Owrang O., American University, USA

Current database technology involves processing a large volume of data in order to discover new knowl-edge. However, knowledge discovery on just the most detailed and recent data does not reveal the long-term trends. Relational databases create new types of problems for knowledge discovery since they are normalized to avoid redundancies and update anomalies, which make them unsuitable for knowledge discovery. A key issue in any discovery system is to ensure the consistency, accuracy, and completeness of the discovered knowledge. This selection describes the aforementioned problems associated with the quality of the discovered knowledge and provides solutions to avoid them.

Section II

Development and Design Methodologies

Chapter V

Business Data Warehouse: The Case of Wal-Mart ... 85 Indranil Bose, The University of Hong Kong, Hong Kong

Lam Albert Kar Chun, The University of Hong Kong, Hong Kong Leung Vivien Wai Yue, The University of Hong Kong, Hong Kong Li Hoi Wan Ines, The University of Hong Kong, Hong Kong Wong Oi Ling Helen, The University of Hong Kong, Hong Kong

(11)

process of the data warehouse. The implications of the recent advances in technologies such as RFID, which is likely to play an important role in the Wal-Mart data warehouse in future, are also detailed in this chapter.

Chapter VI

A Database Project in a Small Company (or How the Real World Doesn’t Always

Follow the Book) ... 95 Efrem Mallach, University of Massachusetts Dartmouth, USA

The selection presents a small consulting company’s experience in the design and implementation of a database and associated information retrieval system. The company’s choices are explained within the

context of the firm’s needs and constraints. Issues associated with development methods are discussed,

along with problems that arose from not following proper development disciplines. Ultimately, the author asserts that while the system provided real value to its users, the use of proper development disciplines could have reduced some problems while not reducing that value.

Chapter VII

Conceptual Modeling for XML: A Myth or a Reality ...112 Sriram Mohan, Indiana University, USA

Arijit Sengupta, Wright State University, USA

Conceptual design is independent of the final platform and the medium of implementation, and is usu -ally in a form that is understandable to managers and other personnel who may not be familiar with the

low-level implementation details, but have a major influence in the development process. Although a

strong design phase is involved in most current application development processes, conceptual design

for XML has not been explored significantly in literature or in practice. In this chapter, the reader is

introduced to existing methodologies for modeling XML. A discussion is then presented comparing and

contrasting their capabilities and deficiencies, and delineating the future trend in conceptual design for

XML applications.

Chapter VIII

Designing Secure Data Warehouses... 134 Rodolfo Villarroel, Universidad Católica del Maule, Chile

Eduardo Fernández-Medina, Universidad de Castilla-La Mancha, Spain Juan Trujillo, Universidad de Alicante, Spain

Mario Piattini, Universidad de Castilla-La Mancha, Spain

As an organization’s reliance on information systems governed by databases and data warehouses (DWs) increases, so does the need for quality and security within these systems. Since organizations generally

deal with sensitive information such as patient diagnoses or even personal beliefs, a final DW solution should restrict the users that can have access to certain specific information. This chapter presents a

(12)

Chapter IX

Web Data Warehousing Convergence: From Schematic to Systematic ... 148 D. Xuan Le, La Trobe University, Australia

J. Wenny Rahayu, La Trobe University, Australia David Taniar, Monash University, Australia

This chapter proposes a data warehouse integration technique that combines data and documents from

different underlying documents and database design approaches. Well-defined and structured data,

semi-structured data, and unstructured data are integrated into a Web data warehouse system and user

specified requirements and data sources are combined to assist with the definitions of the hierarchical structures. A conceptual integrated data warehouse model is specified based on a combination of user

requirements and data source structure, which necessitates the creation of a logical integrated data ware-house model. A case study is then developed into a prototype in a Web-based environment that enables the evaluation. The evaluation of the proposed integration Web data warehouse methodology includes

the verification of correctness of the integrated data, and the overall benefits of utilizing this proposed

integration technique.

Section III Tools and Technologies

Chapter X

Visual Query Languages, Representation Techniques, and Data Models ... 174 Maria Chiara Caschera, IRPPS-CNR, Italy

Arianna D’Ulizia, IRPPS-CNR, Italy Leonardo Tininini, IASI-CNR, Italy

An easy, efficient, and effective way to retrieve stored data is obviously one of the key issues of any information system. In the last few years, considerable effort has been devoted to the definition of more

intuitive, visual-based querying paradigms, attempting to offer a good trade-off between expressive-ness and intuitiveexpressive-ness. In this chapter, the authors analyze the main characteristics of visual languages

specifically designed for querying information systems, concentrating on conventional relational data -bases, but also considering information systems with a less rigid structure such as Web resources storing XML documents. Two fundamental aspects of visual query languages are considered: the adopted visual

representation technique and the underlying data model, possibly specialized to specific application

contexts.

Chapter XI

Application of Decision Tree as a Data Mining Tool in a Manufacturing System ... 190 S. A. Oke, University of Lagos, Nigeria

This selection demonstrates the application of decision tree, a data mining tool, in the manufacturing

(13)

for decision making, which could be properly revealed with the application of appropriate data mining techniques. Decision trees are employed for identifying valuable information in manufacturing databases. Practically, industrial managers would be able to make better use of manufacturing data at little or no extra investment in data manipulation cost. The work shows that it is valuable for managers to mine data for better and more effective decision making.

Chapter XII

A Scalable Middleware for Web Databases ... 206 Athman Bouguettaya, Virginia Tech, USA

Zaki Malik, Virginia Tech, USA

Abdelmounaam Rezgui, Virginia Tech, USA Lori Korff, Virginia Tech, USA

The emergence of Web databases has introduced new challenges related to their organization, access, integration, and interoperability. New approaches and techniques are needed to provide across-the-board transparency for accessing and manipulating Web databases irrespective of their data models, platforms, locations, or systems. In meeting these needs, it is necessary to build a middleware infrastructure to

sup-port flexible tools for information space organization communication facilities, information discovery,

content description, and assembly of data from heterogeneous sources. This chapter describes a scalable

middleware for efficient data and application access built using available technologies. The resulting

system, WebFINDIT, is a scalable and uniform infrastructure for locating and accessing heterogeneous and autonomous databases and applications.

Chapter XIII

A Formal Verification and Validation Approach for Real-Time Databases ... 234 Pedro Fernandes Ribeiro Neto, Universidade do Estado–do Rio Grande do Norte, Brazil Maria Lígia Barbosa Perkusich, Universidade Católica de Pernambuco, Brazil

Hyggo Oliveira de Almeida, Federal University of Campina Grande, Brazil Angelo Perkusich, Federal University of Campina Grande, Brazil

Real-time database-management systems provide efficient support for applications with data and trans -actions that have temporal constraints, such as industrial automation, aviation, and sensor networks, among others. Many issues in real-time databases have brought interest to research in this area, such as: concurrence control mechanisms, scheduling policy, and quality of services management. However, considering the complexity of these applications, it is of fundamental importance to conceive formal

verification and validation techniques for real-time database systems. This chapter presents a formal verification and validation method for real-time databases. Such a method can be applied to database

systems developed for computer integrated manufacturing, stock exchange, network-management, and command-and-control applications and multimedia systems.

Chapter XIV

A Generalized Comparison of Open Source and Commercial Database Management Systems ... 252 Theodoros Evdoridis, University of the Aegean, Greece

(14)

This chapter attempts to bring to light the field of one of the less popular branches of the open source

software family, which is the open source database management systems branch. In view of the

objec-tive, the background of these systems is first briefly described followed by presentation of a fair generic

database model. Subsequently and in order to present these systems under all their possible features, the main system representatives of both open source and commercial origins will be compared in relation to this model, and evaluated appropriately. By adopting such an approach, the chapter’s initial concern is to ensure that the nature of database management systems in general can be apprehended. The overall orientation leads to an understanding that the gap between open and closed source database management

systems has been significantly narrowed, thus demystifying the respective commercial products.

Section IV

Application and Utilization

Chapter XV

An Approach to Mining Crime Patterns ... 268 Sikha Bagui, The University of West Florida, USA

This selection presents a knowledge discovery effort to retrieve meaningful information about crime from a U.S. state database. The raw data were preprocessed, and data cubes were created using Structured Query Language (SQL). The data cubes then were used in deriving quantitative generalizations and for further analysis of the data. An entropy-based attribute relevance study was undertaken to determine the relevant attributes. A machine learning software called WEKA was used for mining association rules, developing a decision tree, and clustering. SOM was used to view multidimensional clusters on a regular two-dimensional grid.

Chapter XVI

Bioinformatics Web Portals ... 296 Mario Cannataro, Università “Magna Græcia” di Catanzaro, Italy

Pierangelo Veltri, Università “Magna Græcia” di Catanzaro, Italy

Bioinformatics involves the design and development of advanced algorithms and computational platforms to solve problems in biomedicine (Jones & Pevzner, 2004). It also deals with methods for acquiring, storing, retrieving and analysing biological data obtained by querying biological databases or provided by experiments. Bioinformatics applications involve different datasets as well as different software tools and algorithms. Such applications need semantic models for basic software components and need

advanced scientific portal services able to aggregate such different components and to hide their details and complexity from the final user. For instance, proteomics applications involve datasets, either pro -duced by experiments or available as public databases, as well as a huge number of different software tools and algorithms. To use such applications, it is required to know both biological issues related to data generation and results interpretation and informatics requirements related to data analysis.

Chapter XVII

An XML-Based Database for Knowledge Discovery: Definition and Implementation ... 305 Rosa Meo, Università di Torino, Italy

(15)

Inductive databases have been proposed as general purpose databases to support the KDD process. Unfortunately, the heterogeneity of the discovered patterns and of the different conceptual tools used

to extract them from source data make integration in a unique framework difficult. In this chapter, us -ing XML as the unify-ing framework for inductive databases is explored, and a new model, XML for data mining (XDM), is proposed. The basic features of the model are presented, based on the concepts of data item (source data and patterns) and statement (used to manage data and derive patterns). This model uses XML namespaces (to allow the effective coexistence and extensibility of data mining opera-tors) and XML schema, by means of which the schema, state and integrity constraints of an inductive

database are defined.

Chapter XVIII

Enhancing UML Models: A Domain Analysis Approach ... 330 Iris Reinhartz-Berger, University of Haifa, Israel

Arnon Sturm, Ben-Gurion University of the Negev, Israel

UML has been largely adopted as a standard modeling language. The emergence of UML from different modeling languages has caused a wide variety of completeness and correctness problems in UML mod-els. Several methods have been proposed for dealing with correctness issues, mainly providing internal consistency rules, but ignoring correctness and completeness with respect to the system requirements and the domain constraints. This chapter proposes the adoption of a domain analysis approach called application-based domain modeling (ADOM) to address the completeness and correction problems of UML models. Experimental results from a study which checks the quality of application models when utilizing ADOM on UML suggest that the proposed domain helps in creating more complete models without compromising comprehension.

Chapter XIX

Seismological Data Warehousing and Mining: A Survey ... 352 Gerasimos Marketos,University of Piraeus, Greece

Yannis Theodoridis, University of Piraeus, Greece

Ioannis S. Kalogeras, National Observatory of Athens, Greece

Earthquake data is comprised of an ever increasing collection of earth science information for

post-processing analysis. Earth scientists, as well as local and national administration officers, use these data collections for scientific and planning purposes. In this chapter, the authors discuss the architecture of a

seismic data management and mining system (SDMMS) for quick and easy data collection, processing,

and visualization. The SDMMS architecture includes a seismological database for efficient and effective

(16)

Section V Critical Issues

Chapter XX

Business Information Integration from XML and Relational Databases Sources ... 369

Ana María Fermoso Garcia, Pontifical University of Salamanca, Spain Roberto Berjón Gallinas, Pontifical University of Salamanca, Spain Roberto Berjón Gallinas, Pontifical University of Salamanca, Spain

This chapter introduces different alternatives to store and manage jointly relational and eXtensible Markup Language (XML) data sources. Nowadays, businesses are transformed into e-business and have to manage large data volumes and from heterogeneous sources. To manage large amounts of in-formation, Database Management Systems (DBMS) continue to be one of the most used tools, and the most extended model is the relational one. On the other side, XML has reached the de facto standard to present and exchange information between businesses on the Web. Therefore, it could be necessary to use tools as mediators to integrate these two different data to a common format like XML, since it is the

main data format on the Web. First, a classification of the main tools and systems where this problem

is handled is made, with their advantages and disadvantages. The objective will be to propose a new system to solve the integration business information problem.

Chapter XXI

Security Threats in Web-Powered Databases and Web Portals ... 395 Theodoros Evdoridis, University of the Aegean, Greece

Theodoros Tzouramanis, University of the Aegean, Greece

It is a strongly held view that the scientific branch of computer security that deals with Web-powered databases (Rahayu & Taniar, 2002) that can be accessed through Web portals (Tatnall, 2005) is both complex and challenging. This is mainly due to the fact that there are numerous avenues available for

a potential intruder to follow in order to break into the Web portal and compromise its assets and functionality. This is of vital importance when the assets that might be jeopardized belong to a legally sensitive Web database such as that of an enterprise or government portal, containing sensitive and

confidential information. It is obvious that the aim of not only protecting against, but mostly preventing

from potential malicious or accidental activity that could set a Web portal’s asset in danger, requires an attentive examination of all possible threats that may endanger the Web-based system.

Chapter XXII

Empowering the OLAP Technology to Support Complex Dimension Hierarchies... 403 Svetlana Mansmann, University of Konstanz, Germany

Marc H. Scholl, University of Konstanz, Germany

(17)

and modeling complex multidimensional data, with the major effort at the conceptual level of transforming irregular hierarchies to make them navigable in a uniform manner. The properties of various hierarchy types are formalized and a two-phase normalization approach is proposed: heterogeneous dimensions are reshaped into a set of well-behaved homogeneous subdimensions, followed by the enforcement of

summarizability in each dimension’s data hierarchy. The power of the current approach is exemplified

using a real-world study from the domain of academic administration.

Chapter XXIII

NetCube: Fast, Approximate Database Queries Using Bayesian Networks ... 424 Dimitris Margaritis, Iowa State University, USA

Christos Faloutsos, Carnegie Mellon University, USA Sebastian Thrun, Stanford University, USA

This chapter presents a novel method for answering count queries from a large database approximately and quickly. This method implements an approximate DataCube of the application domain, which can be used to answer any conjunctive count query that can be formed by the user. The DataCube is a conceptual device that in principle stores the number of matching records for all possible such queries. However, because its size and generation time are inherently exponential, the current approach uses one or more Bayesian networks to implement it approximately. By means of such a network, the proposed method, called NetCube, exploits correlations and independencies among attributes to answer a count query quickly without accessing the database. Experimental results show that NetCubes have fast generation and use, achieve excellent compression and have low reconstruction error while also naturally allowing for visualization and data mining.

Chapter XXIV

Node Partitioned Data Warehouses: Experimental Evidence and Improvements ... 450 Pedro Furtado, University of Coimbra, Portugal

Data Warehouses (DWs) with large quantities of data present major performance and scalability chal-lenges, and parallelism can be used for major performance improvement in such context. However, instead of costly specialized parallel hardware and interconnections, the authors of this selection focus on low-cost standard computing nodes, possibly in a non-dedicated local network. In this environment, special care must be taken with partitioning and processing. Experimental evidence is used to analyze the shortcomings of a basic horizontal partitioning strategy designed for that environment, and then

im-provements to allow efficient placement for the low-cost Node Partitioned Data Warehouse are proposed

(18)

Section VI Emerging Trends

Chapter XXV

Rule Discovery from Textual Data ... 471 Shigeaki Sakurai, Toshiba Corporation, Japan

This chapter introduces knowledge discovery methods based on a fuzzy decision tree from textual data. The author argues that the methods extract features of the textual data based on a key concept dictionary, which is a hierarchical thesaurus, and a key phrase pattern dictionary, which stores characteristic rows of both words and parts of speech, and generate knowledge in the format of a fuzzy decision tree. The author also discusses two application tasks. One is an analysis system for daily business reports and the other is an e-mail analysis system. The author hopes that the methods will provide new knowledge for researchers engaged in text mining studies, facilitating their understanding of the importance of the fuzzy decision tree in processing textual data.

Chapter XXVI

Action Research with Internet Database Tools ... 490 Bruce L. Mann, Memorial University, Canada

This chapter discusses and presents examples of Internet database tools, typical instructional methods used with these tools, and implications for Internet-supported action research as a progressively deeper

examination of teaching and learning. First, the author defines and critically explains the use of arti -facts in an educational setting and then differentiates between the different types of arti-facts created by

both students and teachers. Learning objects and learning resources are also defined and, as the chapter

concludes, three different types of instructional devices – equipment, physical conditions, and social mechanisms or arrangements – are analyzed and an exercise is offered for both differentiating between and understanding differences in instruction and learning.

Chapter XXVII

Database High Availability: An Extended Survey ... 499 Moh’d A. Radaideh, Abu Dhab Police – Ministry of Interior, United Arab Emirates

Hayder Al-Ameed, United Arab Emirates University, United Arab Emirates

With the advancement of computer technologies and the World Wide Web, there has been an

explo-sion in the amount of available e-services, most of which represent database processing. Efficient and

effective database performance tuning and high availability techniques should be employed to ensure that all e-services remain reliable and available all times. To avoid the impacts of database downtime, many corporations have taken interest in database availability. The goal for some is to have continuous availability such that a database server never fails. Other companies require their content to be highly available. In such cases, short and planned downtimes would be allowed for maintenance purposes. This

chapter is meant to present the definition, the background, and the typical measurement factors of high

(19)

xviii

Prologue

HISTORICAL OVERVIEW OF DATABASE TECHNOLOGY

This prologue provides a brief historical perspective of developments in database technology, and then reviews and contrasts three current approaches to elevate the initial design of database systems to a conceptual level.

Beginning in the late 1970s, the old network and hierarchic database management systems (DBMSs)

began to be replaced by relational DBMSs, and by the late 1980s relational systems performed sufficiently well that the recognized benefits of their simple bag-oriented data structure and query language (SQL)

made relational DBMSs the obvious choice for new database applications. In particular, the simplicity of Codd’s relational model of data where all facts are stored in relations (sets of ordered n-tupes) facilitated data access and optimization for a wide range of application domains (Codd, 1970). Although Codd’s data model was purely set-oriented, industrial relational DBMSs and SQL itself are bag-oriented, since SQL allows keyless tables, and SQL queries queries may return multisets (Melton & Simon, 2002).

Unlike relational databases, network and hierarchic databases store facts in not only record types but also navigation paths between record types. For example, in a hierarchic database the fact that employee 101 works for the Sales department would be stored as a parent-child link from a department record (an instance of the Department record type where the deptName attribute has the value ‘Sales’) to an em-ployee record (an instance of the Emem-ployee record type where the empNr attribute has the value 101).

Although relational systems do support foreign key “relationships” between relations, these relation-ships are not navigation paths; instead they simply encode constraints (e.g. each deptName in an Employee table must also occur in the primary key of the Department table) rather than ground facts. For example, the ground fact that employee 101 works for the Sales department is stored by entering the values 101, ‘Sales’ in the empNr and deptName columns on the same row of the Employee table.

In 1989, a group of researchers published “The Object-Oriented Database System Manifesto” in which they argued that object-oriented databases should replace relational databases (Atkinson et al. 1989).

Influenced by object-oriented programming languages, they felt that databases should support not only

core databases features such as persistence, concurrency, recovery, and an ad hoc query facility, but also object-oriented features such as complex objects, object identity, encapsulation of behavior with data, types or classes, inheritance (subtyping), overriding and late binding, computational completeness, and extensibility. Databases conforming to this approach are called object-oriented databases (OODBs) or simply object databases (ODBs).

Partly in response to the OODB manifesto, one year later a group of academic and industrial

re-searchers proposed an alternative “3rd_{generation DBMS manifesto” (Stonebraker et al., 1990). They}

considered network and hierarchic databases to be first generation, and relational databases to be second

(20)

xix

While other kinds of databases (e.g. deductive, temporal, and spatial) were also developed to address

specific needs, none of these has gained a wide following in industry. Deductive databases typically

provide a declarative query language such as a logic programming language (e.g. Prolog), giving them powerful rule enforcement mechanisms with built-in backtracking and strong support for recursive rules (e.g. computing the transitive closure of an ancestor relation).

Spatial databases provide efficient management of spatial data, such as maps (e.g. for geographi -cal applications), 2-D visualizations (e.g. for circuit designs), and 3-D visualizations (e.g. for medi-cal imaging). Built-in support for spatial data types (e.g. points, lines, polygons) and spatial operators (e.g. intersect, overlap, contains) facilitates queries of a spatial nature (e.g. how many residences lie within 3 miles of the proposed shopping center?).

Temporal databases provide built-in support for temporal data types (e.g. instant, duration, period) and temporal operators (e.g. before, after, during, contains, overlaps, precedes, starts, minus), facilitating queries of a temporal nature (e.g. which conferences overlap in time?).

A more recent proposal for database technology employs XML (eXtensible Markup Language). XML databases store data in XML (eXtensible Markup Language), with their structure conforming either to

the old DTD (Document Type Definition) or the newer XSD (XML Schema Definition) format. Like

the old hierarchic databases, XML is hierarchic in nature. However XML is presented as readable text, using tags to provide the structure. For example, the facts that employees 101 and 102 work for the Sales department could be stored (along with their names and birth dates) in XML as follows.

<department name = “Sales”> <employee empNr = “101”> <name>Fred Smith</name> <birthdate>1946-02-15</birthdate> </employee>

<employee empNr = “102”> <name>Sue Jones</name> <birthdate>1980-06-30</birthdate> </employee>

</department>

Just as SQL is used for querying and manipulating relational data, the XQuery language is now the standard language for querying and manipulating XML data, (Melton & Buxton, 2006).

One very recent proposal for a new kind of database technology is the so-called “ontology database”, which is proposed to help achieve the vision of the semantic web (Berners-Lee et al., 2001). The basic idea is that documents spread over the Internet may include tags to embed enough semantic detail to enable understanding of their content by automated agents. Built on Unicode text, URIrefs (Uniform

Resource Identifiers) to identify resources, XML and XSD datatypes, facts are encoded in RDF (Resource

Description Framework) triples (subject, predicate, object) representing binary relationships from a node (resource or literal) to another node. RDF Schema (RDFS) builds on RDF by providing inbuilt support for classes and subclassing. The Web Ontology Language (OWL) builds on these underlying layers to provide what is now the most popular language for developing ontologies (schemas and their database instances) for the semantic web.

OWL includes three versions. OWL Lite provides a decidable, efficient mechanism for simple on

-tologies composed mainly of classification hierarchies and relationships with simple constraints. OWL

DL (the “DL” refers to Description Logic) is based on a stronger SHOIN(D) description logic that is

still decidable. OWL Full is more expressive but is undecidable, and even goes beyond even first order

(21)

xx

All of the above database technologies are still in use, to varying degrees. While some legacy systems still use the old network and hierarchic DBMSs, new database applications are not built on these obso-lete technologies. Object databases, deductive databases, and temporal databases provide advantages for niche markets. However the industrial database world is still dominated by relational and object-relational DBMSs. In practice, ORDBs have become the dominant DBMS, since virtually all the major industrial relational DBMSs (e.g. Oracle, IBM DB2, and Microsoft SQL Server) extended their systems with object-oriented features, and also expanded their support for data types including XML. The SQL standard now includes support for collection types (e.g. arrays, row types and multisets, recursive queries and XML). Some ORDBMSs (e.g. Oracle) include support for RDF. While SQL is still often used for data exchange, XML is being increasingly used for exchanging data between applications.

In practice, most applications use an object model for transient (in-memory) storage, while using an RDB or ORDB for persistent storage. This has led to extensive efforts to facilitate transformation between these differently structured data stores (known as Object-Relational mapping). One interesting initiative in this regard is Microsoft’s Language Integrated Query (LINQ) technology, which allows users to interact with relational data by using an SQL-like syntax in their object-oriented program code.

Recently there has been a growing recognition that the best way to develop database systems is by

transformation from a high level, conceptual schema that specifies the structure of the data in a way

that can be easily understood and hence validated by the (often nontechnical) subject matter experts,

who are the only ones who can reliably determine whether the proposed models accurately reflect their

business domains.

While this notion of model driven development was forcefully and clearly proposed over a quarter century ago in an ISO standard (van Griethuysen, 1982), only in the last decade has it begun to be widely accepted by major commercial interests. Though called differently by different bodies (e.g. the Object management Group calls it “Model Driven Architecture” and Microsoft promotes model driven

development based on Domain Specific Languages) the basic idea is to clearly specify the business

domain model at a conceptual level, and then transform it as automatically as possible to application code, thereby minimizing the need for human programming. In the next section we review and contrast three of the most popular approaches to specifying high level data models for subsequent transformation into database schemas.

CONCEPTUAL DATABASE MODELING APPROACHES

In industry, most database designers either use a variant of Entity Relationship (ER) modeling or simply

design directly at the relational level. The basic ER approach was first proposed by Chen (1976), and

structures facts in terms of entities (e.g. Person, Car) that have attributes (e.g. gender, birthdate) and participate in relationships (e.g. Person drives Car). The most popular industrial versions of ER are the Barker ER notation (Barker, 1990), Information Engineering (IE) (Finkelstein, 1998), and IDEF1X (IEEE, 1999). IDEF1X is actually a hybrid of ER and relational, explicitly using relational concepts such as foreign keys. Barker ER is currently the best and most expressive of the industrial ER notations, so we focus our ER discussion on it.

The Unified Modeling Language (UML) was adopted by the Object Management Group (OMG) in 1997 as a language for object-oriented (OO) analysis and design. After several minor revisions, a major

overhaul resulted in UML version 2.0 (OMG, 2003), and the language is still being refined. Although

(22)

xxi

Language (OCL) is too technical for most business people to understand (Warmer & Kleppe, 2003). For such reasons, although UML is widely used for documenting object-oriented programming applications, it is far less popular than ER for database design.

Despite their strengths, both ER and UML are fairly weak at capturing the kinds of business rules found in data-intensive applications, and their graphical language does not lend itself readily to verbal-ization and multiple instantiation for validating data models with domain experts.

These problems can be remedied by using a fact-oriented approach for information analysis, where

communication takes place in simple sentences, each sentence type can easily be populated with multiple instances, attributes are avoided in the base model, and far more business rules can be captured graphi-cally. At design time, a fact-oriented model can be used to derive an ER model, a UML class model, or a logical database model.

Object Role Modeling (ORM), the main exemplar of the fact-oriented approach, originated in Eu-rope in the mid-1970s (Falkenberg, 1976), and been extensively revised and extended since, along with commercial tool support (e.g. Halpin, Evans, Hallock & MacLean, 2003). Recently, a major upgrade to the methodology resulted in ORM 2, a second generation ORM (Halpin 2005; Halpin & Morgan 2008). Neumont ORM Architect (NORMA), an open source tool accessible online at www.ORMFoundation. org, is under development to provide deep support for ORM 2 (Curland & Halpin, 2007).

ORM pictures the world simply in terms of objects (entities or values) that play roles (parts in rela-tionships). For example, you are now playing the role of reading, and this prologue is playing the role

of being read. Wherever ER or UML uses an attribute, ORM uses a relationship. For example, the Person.

birthdate attribute is modeled in ORM as the fact type Person was born on Date, where the role played by date in this relationship may be given the rolename “birthdate”.

ORM is less popular than either ER or UML, and its diagrams typically consume more space because of their attribute-free nature. However, ORM arguably offers many advantages for conceptual analysis, as illustrated by the following example, which presents the same data model using the three different notations.

In terms of expressibility for data modeling, ORM supports relationships of any arity (unary, binary,

ternary or longer), identification schemes of arbitrary complexity, asserted, derived, and semiderived facts and types, objectified associations, mandatory and uniqueness constraints that go well beyond ER

and UML in dealing with n-ary relationships, inclusive-or constraints, set comparison (subset, equality, exclusion) constraints of arbitrary complexity, join path constraints, frequency constraints, object and role cardinality constraints, value and value comparison constraints, subtyping (asserted, derived and semiderived), ring constraints (e.g. asymmetry, acyclicity), and two rule modalities (alethic and deontic (Halpin, 2007a)). For some comparisons between ORM 1 and ER and UML see Halpin (2002, 2004).

As well as its rich notation, ORM includes detailed procedures for constructing ORM models and transforming them to other kinds of models (ER, UML, Relational, XSD etc.) on the way to implementa-tion. For a general discussion of such procedures, see Halpin & Morgan (2008). For a detailed discussion of using ORM to develop the data model example discussed below, see Halpin (2007b).

Figure 1 shows an ORM schema for a fragment of a book publisher application. Entity types appear

as named, soft rectangles, with simple identification schemes parenthesized (e.g. Books are identified by

(23)

xxii

A bar over a sequence of one or more roles depicts a uniqueness constraint (e.g. each book has at most one booktitle, but a book may be authored by many persons and vice versa). The external

unique-ness constraint (circled bar) reflects the publisher’s policy of publishing at most one book of any given

title in any given year. A dot on a role connector indicates that role is mandatory (e.g. each book has a booktitle).

Subtyping is depicted by an arrow from subtype to supertype. In this case, the PublishedBook subtype is derived (indicated by an asterisk), so a derivation rule for it is supplied. Value constraints are placed in braces (e.g. the possible codes for Gender are ‘M’ and ‘F’).

The ring constraint on the book translation fact type indicates that relationship is acyclic. The ex-clusion constraint (circled X) ensures that no person may review a book that he or she authors. The

frequency constraint (≥ 2) ensures that any book assigned for review has at least two reviewers. The

subset constraint (circled ⊆) ensures that if a person has a title that is restricted to a specific gender (e.g. ‘Mrs’ is restricted to females), then that person must be of that gender—an example of a constraint on a

conceptual join path. The textual declarations provide a subtype definition and two derivation rules, one

in attribute style (using role names) and one in relational style. ORM schemas can also be automatically verbalized in natural languages sentences, enabling validation by domain experts without requiring them to understand the notation (Curland & Halpin, 2007).

Figure 2 depicts the same model in Barker ER notation, supplemented by textual rules (6 numbered constraints, plus 3 derivations) that cannot be captured in this notation.

Barker ER depicts entity types as named, soft rectangles. Mandatory attributes are preceded by an asterisk and optional attributes by “o”. An attribute that is part of the primary identifier is preceded by “#”, and a role that is part of an identifier has a stroke “|” through it.

All relationships must be binary, with each half of a relationship line depicting a role. A crowsfoot indicates a maximum cardinality of many. A line end with no crowsfoot indicates a maximum cardinal-ity of one. A solid line end indicates the role is mandatory, and a dashed line end indicates the role is optional. Subtyping is depicted by Euler diagrams with the subtype inside the supertype. Unlike ORM and UML, Barker ER supports only single inheritance, and requires that the subtyping always forms a partition.

Figure 1. Book publisher schema in ORM

Book (I SBN)

is authored by

Person (.nr)

is assigned for review by “ReviewAssignment !”

PersonName has/ is of

Gender (.code) is of

{ ‘M’, ‘F’}

has

PersonTitle is restricted to

resulted in

Grade (.nr) { 1..5} BookTitle

has

Year (CE)

was published in

Published Book*

is translated from

… in … sold ...

NrCopies

sold total- * is a best seller*

Each PublishedBook is a Book that was published in some Year.

* For each PublishedBook, totalCopiesSold= sum(copiesSoldI nYear).

* PublishedBook is a best seller iff PublishedBook sold total NrCopies > = 10000.

[ copiesSoldI nYear]

[ totalCopiesSold]

(24)

xxiii

Figure 3 shows the same model as a class diagram in UML, supplemented by several textual rules captured either as informal notes (e.g. acyclic) or as formal constraints in OCL (e.g. yearPublished ->

notEmpty()) or as nonstandard notations in braces (e.g., the {P} for preferred identifier and {Un} for

uniqueness are not standard UML). Derived attributes are preceded by a slash. Attribute multiplicities

are assumed to be 1 (i.e. exactly one) unless otherwise specified (e.g. restrictedGender has a multiplicity

of [0..1], i.e. at most one). A “*” for maximum multiplicity indicates “many”. Figure 3. Book publisher schema in UML, supplemented by extra rules Figure 2. Book publisher schema in Barker ER, supplemented by extra rules

BOOK * copies sold in year

Derivation Rules:

Published_Book.totalCopiesSold = sum(Book_Sales_Figure.copies_sold_in_year) . Published_Book.is_a_best seller = totalCopiesSold > = 10000.

Subtype Definition:

Each Published_Book is a Book where year_published is not null.

PERSON

REVI EW ASSI GNMENT

o _grade

1_{(book title, year published) is unique.} 2_{The translation relationship is acyclic.} 3_{Review Assignment is disjoint with authorship.} 4_{Possible values of gender are ‘M’, ‘F’.}

5_{Each person with a person title restricted to a gender} has that gender.

6_{Possible values of grade are 1..5.} 2

yearPublished -> notEmpty().

totalCopiesSold = sum(salesFigure.copiesSoldI nYear). isaBestSeller = (totalCopiesSold > = 10000).

author

restrictedGender [ 0..1] : GenderCode Title

*

1 translation

title.restrictedGender = self.gender or

title.restrictedGender -> isEmpty()

(25)

xxiv

Part of the problem with the UML and ER models is that in these approaches personTitle and gender would normally be treated as attributes, but for this application we need to talk about them to capture a relevant business rule. The ORM model arguably provides a more natural representation of the business domain, while also formally capturing much more semantics with its built-in constructs, facilitating transformation to executable code. This result is typical for industrial business domains.

Figure 4 shows the relational database schema obtained by mapping these data schemas via ORM’s Rmap algorithm (Halpin & Morgan, 2008), using absorption as the default mapping for subtyping. Here square brackets indicate optional, dotted arrows indicate subset constraints, and a circled “X” depicts an exclusion constraint. Additional constraints are depicted as numbered textual rules in a high level relational notation. For implementation, these rules are transformed further into SQL code (e.g. check clauses, triggers, stored procedures, views).

CONCLUSION

While many kinds of database technology exist, RDBs and ORDBs currently dominate the market, with XML being increasingly used for data exchange. While ER is still the main conceptual modeling approach for designing databases, UML is gaining a following for this task, and is already widely used for object

oriented code design. Though less popular than ER or UML, the fact-oriented approach exemplified by

ORM has many advantages for conceptual data analysis, providing richer coverage of business rules, easier validation by business domain experts, and semantic stability (ORM models and queries are un-impacted by changes that require one to talk about an attribute). Because ORM models may be used to generate ER and UML models, it may also be used in conjunction with these if desired.

Figure 4. Book publisher relational schema

Book ( isbn, title, [ yearPublished] , [ translationSource] )

SalesFigure ( isbn, yearSold, copiesSold )

Authorship ( personNr, isbn )

ReviewAssignment ( personNr, isbn, [ grade] )

Person ( personNr, personName, gender, personTitle )

TitleRestriction ( personTitle, gender )

View: SoldBook (isbn, totalCopiesSold, isaBestSeller )

1

1 _acyclic

2_{only w here}_{yearPublished}_exists 3_{SalesFigure.isbn}

4_sum_(copiesSold)_from_SalesFigure_{group by}_isbn 5_{totalCopiesSold > 10000}

6_{not exists}_(Person_join_{TitleRestriction}_on_personTitle w here Person.gender < > TitleRestriction.gender).

≥ 2

{ M, F}

{ M, F} { 1..5}

2

3 4 5

6

(26)

xxv

With a view to providing better support at the conceptual level, the OMG recently adopted the

Se-mantics of Business Vocabulary and Business Rules (SBVR) specification (OMG, 2007). Like ORM, the

SBVR approach is fact oriented instead of attribute-based, and includes deontic as well as alethic rules. Many companies are now looking to model-driven development as a way to dramatically increase the productivity, reliability, and adaptability of software engineering approaches. It seems likely that both object-oriented and fact-oriented approaches will be increasingly utilized in the future to increase the proportion of application code that can be generated from higher level models.

REFERENCES

Atkinson, M., Bancilhon, F., DeWitt, D., Dittrick, K., Maier, D. & Zdonik, S. (1989). The

Object-Ori-ented Database System Manifesto. In W. Kim, J-M. Nicolas & S. Nishio (Eds), Proc. DOOD-89: First

Int. Conf. on Deductive and Object-Oriented Databases (pp. 40–57). Elsevier.

Barker, R. (1990). CASE*Method: Entity Relationship Modelling, Addison-Wesley, Wokingham.

Berners-Lee, T., Hendler, J. & Lassila, O. (2001). ‘The Semantic Web’, Scientific American, May 2001.

Bloesch, A. & Halpin, T. (1997). Conceptual queries using ConQuer-II. In D. Embley & R. Goldstein

(Eds.), Proc. 16th Int. Conf. on Conceptual Modeling ER’97 (pp. 113-126). Berlin: Springer.

Booch, G., Rumbaugh, J. & Jacobson, I. (1999). The Unified Modeling Language User Guide. Reading:

Addison-Wesley.

Chen, P. (1976). ‘The Entity-Relationship Model—Toward a Unified View of Data’, ACM Transactions on Database Systems, vol. 1, no. 1, pp. 9−36.

Codd, E. (1970). A Relational Model of Data for Large Shared Data Banks. CACM, vol. 13, no. 6, pp.

377−87.

Curland, M. & Halpin, T. (2007). Model Driven Development with NORMA. In: Proc. HICSS-40,

CD-ROM, IEEE Computer Society.

Falkenberg, E. (1976). Concepts for modelling information. In G. Nijssen (Ed.), Modelling in Data Base

Management Systems (pp. 95-109). Amsterdam: North-Holland.

Finkelstein, C. (1998). ‘Information Engineering Methodology’, Handbook on Architectures of Information

Systems, eds. P. Bernus, K. Mertins & G. Schmidt, Springer-Verlag, Berlin, Germany, pp. 405–27.

Halpin, T. (2002). Information Analysis in UML and ORM: a Comparison. Advanced Topics in Database

Research, vol. 1, K. Siau (Ed.), Hershey PA: Idea Publishing Group, Ch. XVI (pp. 307-323).

Halpin, T. (2004). Comparing Metamodels for ER, ORM and UML Data Models. In: Siau K (ed)

Ad-vanced Topics in Database Research, vol. 3, Idea Pub. Group, Hershey, pp. 23–44.

Halpin, T. (2005). ORM 2. In: Meersman R et al. (eds) On the Move to Meaningful Internet Systems

2005: OTM 2005 Workshops, LNCS vol 3762. Springer, Berlin Heidelberg New York, pp. 676–687.

Halpin, T. (2006). Object-Role Modeling (ORM/NIAM). In: Handbook on Architectures of Information

(27)

xxvi

Halpin, T. (2007a). Modality of Business Rules. In: Research Issues in Systems Analysis and Design,

Databases and Software Development, ed. K. Siau, IGI Publishing, Hershey, pp. 206-226.

Halpin, T. (2007b). Fact-Oriented Modeling: Past, Present and Future. In: Krogstie J, Opdahl A,

Brinkkem-per S (eds) Conceptual Modelling in Information Systems Engineering. Springer, Berlin, pp. 19-38.

Halpin, T. & Bloesch, A. (1999). Data modeling in UML and ORM: a comparison. Journal of Database

Management, 10(4), 4-13.

Halpin, T., Evans, K, Hallock, P. & MacLean, W. (2003). Database Modeling with Microsoft® Visio for

Enterprise Architects, San Francisco: Morgan Kaufmann.

Halpin, T. & Morgan, T. (2008). Information Modeling and Relational Databases. 2nd_Edn._San Fran-cisco: Morgan Kaufmann.

IEEE (1999). IEEE standard for conceptual modeling language syntax and semantics for IDEF1X₉₇

(IDEF_object), IEEE Std 1320.2–1998, IEEE, New York.

ter Hofstede, A., Proper, H. & van der Weide, T. (1993). Formal definition of a conceptual language for

the description and manipulation of information models. Information Systems 18(7), 489-523.

Jacobson, I., Booch, G. & Rumbaugh, J. (1999). The Unified Software Development Process. Reading:

Addison-Wesley.

Melton, J. & Simon, A. 2002, SQL:1999Understanding Relational Language Components, Morgan

Kaufmann.

Melton, J. & Buxton, S. 2006, Querying XML: XQuery, XPath, and SQL/XML in Context, Morgan Kaufmann.

OMG (2003). OMG Unified Modeling Language Specification, version 2.0 [Online] Available: http://

www.uml.org/.

OMG (2007). Semantics of Business Vocabulary and Business Rules (SBVR). URL: http://www.omg. org/cgi-bin/doc?dtc/2006-08-05.

Rumbaugh, J., Jacobson, I. & Booch, G. (1999). The Unified Modeling Language Reference Manual.

Reading: Addison-Wesley.

Stonebraker, M., Rowe, L., Lindsay, B., Gray, J., Carey, M., Brodie, M., Bernstein, P. & Beech, D. (1990).

‘Third Generation Database System Manifesto’, ACM SIGMOD Record, vol. 19, no. 3.

van Griethuysen, J. (ed.) (1982). Concepts and Terminology for the Conceptual Schema and the

Infor-mation Base, ISO TC97/SC5/WG3, Eindhoven.

Warmer, J. & Kleppe, A. (2003). The Object Constraint Language: Getting Your Models Ready for MDA,

(28)

xxvii

About the Editor

Terry Halpin, BSc, DipEd, BA, MLitStud, PhD, is distinguished professor and vice president (Conceptual

Modeling) at Neumont University. His industry experience includes several years in data modeling technology at Asymetrix Corporation, InfoModelers Inc., Visio Corporation, and Microsoft Corporation. His doctoral thesis formalized Object-Role Modeling (ORM/NIAM), and his current research focuses on conceptual modeling and

(29)

Section I

(30)

1

Chapter I

Conceptual Modeling Solutions

for the Data Warehouse

Stefano Rizzi

DEIS - University of Bologna, Italy

ABSTRACT

In the context of data warehouse design, a basic role is played by conceptual modeling, that pro-vides a higher level of abstraction in describing the warehousing process and architecture in all its aspects, aimed at achieving independence of implementation issues. This chapter focuses on a conceptual model called the DFM that suits the variety of modeling situations that may be encountered in real projects of small to large complexity. The aim of the chapter is to propose a comprehensive set of solutions for conceptual modeling according to the DFM and to give the designer a practical guide for applying them in

the context of a design methodology. Besides the basic concepts of multidimensional modeling, the other issues discussed are descriptive and cross-dimension attributes; convergences; shared, incomplete, recursive, and dynamic hierarchies; multiple and optional arcs; and additivity.

INTRODUCTION

(31)

2

Conceptual Modeling Solutions for the Data Warehouse

support; the workload they support has com-pletely different characteristics, and is widely known as OLAP (online analytical processing). Traditionally, OLAP applications are based on multidimensional modeling that intuitively rep-resents data under the metaphor of a cube whose cells correspond to events that occurred in the business domain (Figure 1). Each event is

quanti-fied by a set of measures; each edge of the cube

corresponds to a relevant dimension for analysis, typically associated to a hierarchy of attributes that further describe it. The multidimensional

model has a twofold benefit. On the one hand,

it is close to the way of thinking of data analyz-ers, who are used to the spreadsheet metaphor; therefore it helps users understand data. On the other hand, it supports performance improvement as its simple structure allows designers to predict the user intentions.

Multidimensional modeling and OLAP work-loads require specialized design techniques. In the context of design, a basic role is played by conceptual modeling that provides a higher level of abstraction in describing the warehousing pro-cess and architecture in all its aspects, aimed at achieving independence of implementation issues.

Conceptual modeling is widely recognized to be the necessary foundation for building a database

that is well-documented and fully satisfies the

user requirements; usually, it relies on a graphical notation that facilitates writing, understanding, and managing conceptual schemata by both de-signers and users.

Unfortunately, in the field of data warehousing

there still is no consensus about a formalism for conceptual modeling (Sen & Sinha, 2005). The entity/relationship (E/R) model is widespread in the enterprises as a conceptual formalism to provide standard documentation for relational information systems, and a great deal of effort has been made to use E/R schemata as the input for designing nonrelational databases as well (Fahrner & Vossen, 1995); nevertheless, as E/R is oriented to support queries that navigate associations be-tween data rather than synthesize them, it is not well suited for data warehousing (Kimball, 1996). Actually, the E/R model has enough expressivity to represent most concepts necessary for modeling a DW; on the other hand, in its basic form, it is not able to properly emphasize the key aspects of the multidimensional model, so that its usage for DWs is expensive from the point of view of the

(32)

graphical notation and not intuitive (Golfarelli, Maio, & Rizzi, 1998).

Some designers claim to use star schemata for conceptual modeling. A star schema is the standard implementation of the multidimensional model on relational platforms; it is just a

(denor-malized) relational schema, so it merely defines

a set of relations and integrity constraints. Using the star schema for conceptual modeling is like starting to build a complex software by writing the code, without the support of and static, func-tional, or dynamic model, which typically leads to very poor results from the points of view of adherence to user requirements, of maintenance, and of reuse.

For all these reasons, in the last few years the research literature has proposed several original approaches for modeling a DW, some based on extensions of E/R, some on extensions of UML. This chapter focuses on an ad hoc conceptual

model, the dimensional fact model (DFM), that

was first proposed in Golfarelli et al. (1998) and continuously enriched and refined during the fol -lowing years in order to optimally suit the variety of modeling situations that may be encountered in real projects of small to large complexity. The aim of the chapter is to propose a comprehensive set of solutions for conceptual modeling according to the DFM and to give a practical guide for apply-ing them in the context of a design methodology. Besides the basic concepts of multidimensional modeling, namely facts, dimensions, measures, and hierarchies, the other issues discussed are

descriptive and cross-dimension attributes; con-vergences; shared, incomplete, recursive, and dynamic hierarchies; multiple and optional arcs; and additivity.

After reviewing the related literature in the next section, in the third and fourth sections, we introduce the constructs of DFM for basic and advanced modeling, respectively. Then, in

the fifth section we briefly discuss the different

methodological approaches to conceptual design. Finally, in the sixth section we outline the open issues in conceptual modeling, and in the last section we draw the conclusions.

RELATED LITERATURE

In the context of data warehousing, the literature proposed several approaches to multidimensional modeling. Some of them have no graphical support and are aimed at establishing a formal foundation for representing cubes and hierarchies as well as an algebra for querying them (Agrawal, Gupta, & Sarawagi, 1995; Cabibbo & Torlone, 1998; Datta & Thomas, 1997; Franconi & Kamble, 2004a; Gyssens & Lakshmanan, 1997; Li & Wang, 1996; Pedersen & Jensen, 1999; Vassiliadis, 1998); since we believe that a distinguishing feature of conceptual models is that of providing a graphical support to be easily understood by both designers and users when discussing and validating require-ments, we will not discuss them.

Table 1. Approaches to conceptual modeling

E/R extension object-oriented ad hoc

no method

Franconi and Kamble (2004b); Sapia et al. (1998); Tryfona et al. (1999)

Abelló et al. (2002); Nguyen, Tjoa, and Wagner

(2000)

Tsois et al. (2001)

(33)

4

The approaches to “strict” conceptual model-ing for DWs devised so far are summarized in Table 1. For each model, the table shows if it is associated to some method for conceptual design and if it is based on E/R, is object-oriented, or is an ad hoc model.

The discussion about whether E/R-based, object-oriented, or ad hoc models are preferable is controversial. Some claim that E/R extensions should be adopted since (1) E/R has been tested for years; (2) designers are familiar with E/R; (3) E/R

has proven flexible and powerful enough to adapt

to a variety of application domains; and (4) several important research results were obtained for the

E/R (Sapia, Blaschka, Hofling, & Dinter, 1998;

Tryfona, Busborg, & Borch Christiansen, 1999). On the other hand, advocates of object-oriented models argue that (1) they are more expressive and better represent static and dynamic properties of information systems; (2) they provide powerful mechanisms for expressing requirements and constraints; (3) object-orientation is currently the dominant trend in data modeling; and (4) UML, in particular, is a standard and is naturally extensible (Abelló, Samos, & Saltor, 2002; Luján-Mora, Trujillo, & Song, 2002). Finally, we believe that ad hoc models compensate for the lack of familiarity from designers with the fact that (1) they achieve better notational economy; (2) they give proper emphasis to the peculiarities of the