Jatin Pandey $ Darshana Pathak
The Energy and Resources Institute
Geographic Information System (GIS) aims to organize complex interrelation between different layers of information through a process of gathering, analysing, processing, storing, and presenting the spatial data and images available through different sources. It integrates hardware, software, and data for capturing, managing, analysing, and displaying all forms of geographically referenced information. This book presents theory, methods, and latest research finding for problem-solving and decision-making using GIS-based technologies.
Key Features
$ Explains raster and vector data and attributive database.
$ Discusses application of GIS in geotechnical engineering, transport engineering, and water resource engineering.
$ Includes model question papers and is well illustrated.
9 788179 935378 ISBN 978-81-7993-537-8
The Energy and Resources Institute
GeoGraphic information
SyStem
The Energy and Resources Institute Jatin Pandey $ Darshana Pathak
GeoGraphic information
SyStem
ISBN 978-81-7993-537-8
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the publisher.
All export rights for this book vest exclusively with The Energy and Resources Institute (TERI). Unauthorized export is a violation of terms of sale and is subject to legal action.
Suggested citation
Pandey, Jatin and Darshana Pathak. 2014.
Geographic Information System. New Delhi: TERI
Published by
The Energy and Resources Institute (TERI) TERI Press
Darbari Seth Block IHC Complex, Lodhi Road New Delhi – 110 003 India
Printed in India
Tel. 2468 2100 or 4150 4900 Fax 2468 2144 or 2468 2145 India +91 • Delhi (0) 11 Email [email protected] Website www.teriin.org
and Mrs Hema Pandey, and my brother Mr Kunal Pandey for their love and support.
Jatin Pandey I dedicate this book to my parents Mr G.D. Pathak
and Mrs Ganga Pathak, and my husband Mr Bharat Joshi, and all family members.
Darshana Pathak
I am happy to present this book on Geographic Information System to the readers. The work put forth by the authors Mrs Darshana Pathak and Mr Jatin Pandey is commendable. I convey my best wishes to both of them and wish them success in their future endeavours. I hope that the readers will find this book useful and fruitful.
Padma Vibhushan Shri Sundarlal Bahuguna Environmentalist and Social Worker Leader of Chipko Movement
Geographic information system (GIS) is a computer-based technology that allows digitization and graphical representation of geographical data for making efficient planning and decision. It captures, stores, manipulates, represents, and analyses spatial data. The present book has been organized splendidly to address basic terms, tools, and techniques of GIS technology.
Geographic information system has multifold applications spanning from natural resource management and network management to planning and a host of other applications. The book intends to first get the readers acquainted with GIS and subsequently drive them into using this technology for various applications. The book has been structured into six chapters, and the content in these chapters is extremely useful and relevant for the students of science and technology.
I sincerely congratulate the young and dynamic authors, Mrs Darshana Pathak and Mr Jatin Pandey, for coming up with a book on such an efficient technology. They truly deserve compliments. I do believe that the book will light up the hearts of many GIS practitioners, implementers, policymakers, and enthusiasts.
Dr Durgesh Pant Professor and Head of Department School of Computer Sciences and Information Technology Uttarakhand Open University
This book would not have taken shape without the contribution of many people. First and foremost, we thank Mr R.K. Joshi of TERI Press for encouraging us to write this book. We express our sincere gratitude to Prof Kamal Kumar Ghanshala, Chancellor, Graphic Era Hill University, and Prof Jitendra Shah, Senior Research Scientist, IIT Bombay for their guidance.
We also thank Ms Nayna Garg and Mr Kapil Bhadouria for their help with data entry and figures used in this book. Moreover, we express our heartfelt gratitude to TERI for considering this manuscript worth publishing. We specially thank Ms Sushmita Ghosh and Mr Arun Kumar Paul of TERI Press for their patience and support during the entire process.
Last but not the least, we are grateful to all our teachers who molded us into what we are today and being the catalyst for our transformation.
Information technology is an umbrella term that includes sets of tools, processes, and methodologies with associated digital equipment to collect, process, and present information. Over the course of past several decades, information technology has been expended by various principal technologies to input, process, output, and distribute the information around the globe. Information technology and the informatics wave have touched shores of every sphere of human life, making it simpler and sophisticated at the same time. This information technology revolution has been a catalyst for the metamorphosis of isolated into a global connected village.
Digital technologies are now part of our everyday work. Today, innovations in information technology are facilitating wide-ranging impacts across numerous domains of society. Various digital revolutions and technologies have proliferated the pre-mechanical, mechanical, electromechanical, and now electronic phase of information technology.
The inherent presence of informatics in the fabric of our lives is also so intricately interwoven that at times we find it difficult to imagine a life without it. In the present times, a life without mobile seems a far- fledged possibility, but this was not the case 20 years ago. Geographic information system, or GIS as it is called, is one such technology, which is in its initial stage; soon it will catch up and integrate into our day-to-day activities. It is an information system that has the potential to organize complex interrelation between different layers of information through the process of gathering, analysing, processing, storing, and presenting spatial data and images available through different sources.
Information management is an important aspect of informatics occasioned by better presentation technique. The pictorial (graphic) representation of data always provides better opportunities for data interpretation and analysis; this fact triggered the inclusion of geographical data with digital technologies. The more manageable the information is, the easier it is to receive, collect, organize, interpret, and verify it.
This in turn helps in making optimal decision. Geographic information
system takes the traditional study of geography and projects it to the digital level. Dr Roger Tomlinson, a Canadian geographer, is known as the “father of GIS”. He was the first to use computerized GIS applied data to a computer program, which then assisted in understanding the management and use of lands in Canada. It is a dynamic and newer field of study comprising applicative informatics. GIS is, in essence, an emerging and enhancing science. The future holds vast application of GIS, and hence it becomes a very important interdisciplinary subject.
The motivation to write this book on GIS was due to the non- availability of a quick reference guide especially for students. The book is comprehensive and covers major topics of GIS subject in many universities across India.
In Chapter 1, we begin with a review of the basic elements, principles, and research on information technology, digital communication, and information system. Chapters 2 and 3 include introduction to basic GIS data-types and methods or techniques for processing these data types. Within these chapters, basic raster and vector data types and data models are presented and illustrated with example. The mapping of geographical data is an essential part of geographical information system. Chapter 3 focuses on the underlying concept of GIS functionality and GIS database management. Data acquisition is the first step of GIS functionality, and GIS has the potential to assimilate data from different sources for further processing. Today, with the advancement of satellite technology, remote sensing is growing with a rapid pace for acquiring data on earth surface. Chapter 4 explains the fundamentals of electromagnetic radiations and the basic working of remote sensing technology. Chapter 5 concentrates on a topic too seldom discussed:
application of GIS technology in various fields with a wide range of advance GIS-based solution for optimal planning and decision-making procedures. Chapter 6 takes numerical in GIS heads on with solutions provided to assist readers.
Although proper care was taken to avoid errors and ambiguities in the book, to err is human. Readers are requested to point out any faults they come across and give suggestions that may help in improving the later editions of the book. For this, you can contact the authors at [email protected] and [email protected]. We hope the book serves its purpose and becomes a guide in true sense for students in their journey through GIS study.
“Let the light within us guide our path”.
Foreword by Shri Sundarlal Bahuguna vii
Foreword by Dr Durgesh Pant ix
Acknowledgements xi Preface xiii 1. Introduction to Geographic Information System 1
Introduction 1
Information System 2
Geographic Information System 5
Cartography and GIS 14
GIS Database 16
GIS Data Type 17
GIS Data Models 18
Topology and GIS 30
Exercises 32
References 33
2. Raster and Vector Data 35
Introduction 35
Vector Data 35
Raster Data 38
Raster Encoding Methods 39
Shape of the Earth 51
Transformation 57
Digitization 61
Exercises 63
References 64
3. Attribute Database and Overlay 65
Attribute Data 65
Relations 66
GIS Functionality 68
Spatial Query 71
Vector Data Queries 73
Classification 74
Overlay 75
Buffer 80
Inter-visibility 82
Network Theory 83
Exercises 86
References 86 4. Remote Sensing and Digital Image Processing 87
Remote Sensing 87
Sources of Energy for Remote Sensing 88
Interaction of Electromagnetic Radiation with Atmosphere 89 Interaction of Electromagnetic Radiation with the
Earth’s Surface 92
Use of Electromagnetic Spectrum for Remote
Sensing Purposes 93
Process of Remote Sensing 94
Sensors and Platforms 95
Orbits and Swaths 96
Platforms 99
Image Processing 100
Applications of Remote Sensing 107
Exercises 108
5. Applications of GIS 109
GIS in Planning and Management of Utility Lines 109
Geotechnical Engineering 111
Water Resource Engineering 113
Example of GIS Application Development with Open Source 115 References 121
6. Numerical Problems 123
Scale Conversions 123
Playing with Database 131
Glossary 137 Bibliography 143
Index 147
About the Authors 153
Information System
1
INTRODUCTION
Driven by a revolution in digital technologies, the requisites and practice of science are changing. All elements of science—observation, experiment, theory, and modelling—are being transformed by the continuous cycle of generation, access, and use of an ever-increasing range and volume of digital data. Advances in computational capacity and tools, coupled with the accelerating collection and accumulation of data in many disciplines, have given rise to new modes of conducting research. Computers and information technology have penetrated almost all aspects of science and human lives, and data are the essence of this new field. Reuse of digital data has dramatic benefits; it opens up opportunities to utilize information over unlimited time periods and for unlimited purposes, thus aiding science and society.
The term “data” is the plural of the Latin word “datum”, which means “something given”. Data refer to the lowest abstract or raw input, which when processed make meaningful output. Data are raw and have no significance beyond their existence. When data are processed, they become information. Information is the data that have been given meaning by way of relational connection. In computers, a relational database makes information from the data stored within it. Contextual information integrated with a set of skills, experiences, and relevant concepts generates knowledge. Knowledge is a term that can be defined in many ways and is being constantly redefined. In simplest words, it is a cognitive process based on a specific context and a dynamic set of information combined with an individual’s expertise and capability to derive new information and conclusions.
Recent developments in information and knowledge acquisition, along with the proliferation of personal computers, dramatized by information systems, have bridged the gap between information and its application in
various imperative fields of management and decision-making. To reach an optimal solution, an individual needs a set of relevant information and criteria to support his or her decision. Information systems have an inherent potential to enhance decision-making processes. Whatever the nature of the domain—business management or enterprise information systems; natural resource management; or medical, defence, or social facilities—an information system is an important tool in it. Table 1.1 gives the differences between data and information.
Table 1.1 Differences between data and information
Data • Symbols
• Collection of raw facts
• Unprocessed and may not be in order
• Difficult to understand
• Example: spread sheet
Information • Processed data
• Easy to understand and always in order
• Provides answers to what, who, when, and where questions
• An appropriate collection of information, which is able to define reasoning forms knowledge
INFORMATION SYSTEM
Information system is an integrated set of components for collecting, storing, and processing data and for delivering information, knowledge, and digital products. It is a combination of hardware, software, infrastructure, and trained personnel organized to facilitate planning, control, coordination, and decision-making in an organization.
Technically, there are two different perceptions to define information system:
1. Functional perception: An information system is a technologically implemented medium to record, store, and disseminate information as well as to support in making decisions and inferences.
2. Structural perception: An information system consists of a collection of people, processes, data, models, technology, and partly formalized language, forming a cohesive structure, which serves some organizational purpose or function.
An information system captures raw data from within the organization or the external environment as input, processes it, and delivers more meaningful information as output for decision-making. The output is provided to end users or for other activities. In addition to supporting decision-making, coordination, and control, information systems may also help in analysing problems, visualize complex subjects, and create new products.
Components of an Information System
A computer-based information system uses the computer to perform its intended tasks. The components of an information system are as follows (Figure 1.1).
• Software: The software consists of carefully organized instructions and codes written by programmers in any of the various special computer languages.
• Hardware: This refers to the physical parts of a computer and related devices.
• People: Stakeholders are involved at different stages of the life cycle of an information system, such as end users, specialists, programmers, database administrators.
• Database: A database is a knowledge base containing data.
• Network: Communication media and network support are vital components of an information system.
Figure 1.1 Computer-based information system Source O’Brien (1993)
Types of Information System
Information systems are classified into groups depending on the following four parameters—organizational levels, mode of data processing, type of support provided, and system objectives.
Organizational levels
Enterprise information systems, inter-organizational systems, and intra- organizational systems are classified as organizational-level information systems. These systems are organized in a hierarchy where the top system consists of many subsystems below it.
Mode of data processing
Information systems are categorized broadly into three different groups based on how they process data.
1. Batch processing systems: In batch processing systems, periodic processing of an already occurred transaction takes place at different times.
2. Online batch systems: In online batch systems, data are captured by online devices and processed periodically.
3. Online real-time systems: In online real-time systems, data are captured and processed in real time to update records.
Type of support provided
Information systems under this category include different types of office automation systems used by the lower part of the office hierarchy system.
System objectives
Information systems are classified into four subcategories depending on their system objectives.
1. Transaction-processing system: It is a computerized system intended to perform and record the routine daily transactions necessary to conduct a business. A transaction is an event that generates or modifies data, which is eventually stored in an information system.
2. Management information system: It is a planned system of collecting, storing, and disseminating data in the form of information needed to carry out the functions of management.
3. Executive information system: It is a management information system tailored to the strategic information needs of top managers.
4. Decision support system: It is a specific class of computerized information systems that support business and organizational decision- making activities. A properly designed decision support system is an interactive, software-based system intended to help decision-makers compile useful information from raw data, documents, personal knowledge, or business models to identify and solve problems and make decisions.
Table 1.2 briefly lists the major functions and examples of different types of information systems.
Table 1.2 Types of information systems
Type of system Function Example
Functional area
information system Supports the activities within specific functional area
System for processing payroll
Transaction-processing
system Processes transaction
data from business events
Walmart checkout point- of-sale terminal
Enterprise resource
planning system Integrates all functional
areas of the organization Oracle, SAP Management information
system Produces reports
summarized from transaction data, usually in one functional area
Report on total sales for each customer
Decision support system Provides access to data
and analysis tools “What-if” analysis of changes in budget Expert system Mimics human expert
in a particular area and makes a decision
Credit card approval analysis
Executive information
system Presents structured,
summarized information about aspects of business important to executives
Status of production by product
GEOGRAPHIC INFORMATION SYSTEM
A geographic information system (GIS) is an information system that has the potential to organize complex interrelations between different layers of information through gathering, analysing, processing, storing, and presenting the spatial data and images available through different sources. It is a computer-based information system that integrates hardware, software, and data for capturing, managing, analysing, and displaying all forms of geographically referenced information.
Geographic information system allows us to view, understand, question, interpret, and visualize geographical data in ways that reveal relationships, patterns, and trends in the form of maps, globes, reports, and charts. It is an important platform for visualizing geographical data. Some important terms associated with GIS are discussed in the following subsections.
Geography
Geography is the science concerned with the formulation of the laws governing the spatial distribution of certain features on the surface of the earth (Schaefer 1953). It is the study of both human and environment landscape where these landscapes comprise real as well as prescriptive spaces. The adverb “geographical” shows the belongingness or characteristics of a geographical location (spatial).
Location
Location is an indispensable concept of geography that distinguishes it from other fields. In simple terms, it is the position of an object on the earth’s surface with respect to a coordinate system. Geocoding is a process through which the geographical coordinates of a street or school can be determined.
Distance
Distance is another important concept to be considered for any system founded on theories of geography. It is the measure of degree of separation between any two points on the earth’s surface. A variety of units are used to measure distance. For a two-dimensional plane surface, distance can be mathematically calculated as follows.
d =
÷
__________________
(x2 – x1)2 + (y2 – y1)2
However, three-dimensional data obtained with the help of surface analysis and map analysis are dealt with in GIS. Direction and space are other geographical concepts to be explored while designing any geography-based decision-making system.
Information
Information is processed raw data, which is a valuable asset to any organization and plays a vital role in planning, decision-making, and management. Geographical information describes the location and
attributes of objects on the earth’s surface, the geographic relationship among objects and phenomena, patterns, and residents. It is wide and varied, and answers the question “what is where”. Geographical information can be illustrated with the help of two-dimensional data (that is, in text format), such as feature table, as well as three-dimensional data, such as maps and graphics (Figure 1.2).
System
System is a well-defined and purposeful structure comprising functional, interdependent, and interrelated elements joined together to achieve a common goal. Every system is said to be composed of subsystems. Every subsystem has some defined objectives. The general components of a system include the following.
• Input and output are the data fed to the system and the resulting information provided by the system.
• Transformation processes convert the input into the desirable output.
• Control is a process that monitors accuracy, safety, performance, and continuity.
Figure 1.2 Representation of geographical information using map Source <http://grass.osgeo.org/screenshots/cartography>
• Feedback is a subsystem that feeds results back to the system and controls the system by making changes to the input and/or process.
• Boundaries are defined by system observers.
Geographic information system integrates geographical features with tabular data to assess real-world problems. At the simplest level, GIS can be thought of as a high-tech equivalent of a map. It provides the facility to visualize, question, analyse, interpret, and understand geographical data to reveal relationships, patterns, and trends in the form of maps, globes, reports, and charts.
Components of a GIS
A working GIS comprises the following five key elements (Figure 1.3).
1. Hardware: The computer on which a GIS operates is known as a hardware. Peripheral devices such as digitizer and scanners are used to convert data from maps and documents into digital form and send them to the computer. A digitizer board is a flat board used to vectorize any map object. Plotters or other kinds of display devices are used to present the result of the data processing, and a tape device is used to store data or programs on magnetic tape.
2. Software: The GIS software provides the functions and tools needed to store, analyse, and display geographic information. The basic software components are as follows.
• A geo-database management system that supports the storage of spatial data for query and analysis.
• Software tools for input and processing of geographical data.
Figure 1.3 Components of GIS
• Graphic user interface to support geographical and imagery data.
3. Data: Data are the essence of GIS. A GIS integrates spatial data with other existing data resources. The integration of spatial and tabular data stored in a database management system is a key functionality afforded by GIS. GIS data can be acquired through data collection devices and methods, or can be purchased from a commercial data provider.
4. Methods: A successful GIS operates according to a well-designed plan and business rules, which are models and operating practices unique to each organization. These methods are used to perform complex spatial analysis providing both qualitative and quantitative results.
5. People: The real power of GIS comes from the stakeholders involved in the different phases of a GIS life cycle as well as from the data acquirer to the end user of the developed GIS application. GIS users range from technical specialists who design and maintain the system to those who use it to help them perform their everyday work.
How GIS Works?
Geographic information system has the ability to display and analyse spatial data integrated with database. Maps can be drawn from databases, and data can be referenced from maps. A GIS database holds a wide and expanded range of geographical information. GIS overlays interrelated information from different sources to process GIS works on themes or layers. A GIS theme is a collection of similar geographical objects such as a road network, waterbodies, or soil types. Figure 1.4 represents superimposed layers of geographical data to create a map.
Any GIS operation consists of the following five central phases.
1. Data acquisition: Data acquisition is the process of gathering relevant geographical information (spatial and non-spatial) from different sources. GIS data can be captured in two forms: (1) analogue or physical data (for example, maps) and (2) digital or data in computer- readable forms (for example, satellite data). The various sources from which GIS data can be captured include maps, satellite images, aerial photographs, tabular data, or field data (see Chapter 2 for data acquisition techniques).
2. Preprocessing: The process of converting gathered data into a suitable format for input into a system is called preprocessing.
Data format conversions, digitization of maps, and recording of
field or attribute information into database are the key steps of this phase. Error detection, data reduction and generalization, and map projection and interpolation are other important techniques of this phase intended to ready the data for analysis and product generation.
3. Data management: GIS databases allow storing, querying, and operating on geographical information, thus enabling adding, deleting, updating, or defining the database contents. Most GIS databases are relational (tabular) in nature, which is easy to manage and manipulate.
4. Analysis and manipulation: The graphic user interfaces of a GIS software enhance its power to manipulate data set according to the analysis requirements. The most common usage of GIS is spatial analysis. With the help of a GIS database and preprocessed data, it is
Figure 1.4 Layer representation of GIS
possible to obtain an interactive visualization of the spatial patterns of any given phenomena. A GIS can take care of a wide range of mathematical operations, overlay techniques, Boolean operations, or logical operators.
5. Product generation: The last phase of a GIS life cycle is product generation. An end product may be interactive digitized maps, reports, charts, documentations, or a GIS web application with the ability to provide a collaborative platform for decision support and policymaking. A block diagram to illustrate the GIS central operations is given in Figure 1.5.
Objectives of GIS
Geographic information system has the following objectives.
• To facilitate generation of a layered structure of geographical information and help visualization of geographical information from different sources as a set of relevant layers and their interrelationships.
Figure 1.5 Block diagram of GIS
• To provide the facility to conduct complex analysis and query on different layers and their geographical attributes, thus making it possible to retrieve new information for optimized decision-making and planning.
• To define existing patterns and trends in the three-dimensional format for better understanding and conceptualization.
• To integrate data from different sources.
• To eliminate redundant data, if any.
• To provide efficient data handling and distribution.
• To incorporate remotely sensed data for resource mapping, monitoring, and management.
Why is GIS Important?
Geography is not concerned only with the distribution of various elements on the earth’s surface. It is also a complex pattern of interrelated phenomena, geological structures, and residents. Hence, only a set of geographical information about an element is not sufficient for analysis.
It is also useful in providing information about the location and features of other elements to which it relates to; however, it is difficult to depict the relationship among elements or phenomena, resulting in limited analysis potential. Thus, there is a need for a tool or system that provides a platform for data acquisition, processing, and output generation to analyse interrelated relevant geographical information. GIS enables analysis of the complex interrelations between different layers of geographical information through a process of gathering, presenting, analysing, and visualizing the data and images that may be available from different sources. GIS is an extension of traditional cartographical (art of map making) sciences integrated with faculties of information systems. It has rapidly accelerated the use of spatial data in different disciplines of management, planning, and decision-making. It makes map data more interactive and collaborative and hence more useful.
GIS databases are often large collections of geographical features and their attributes. An important benefit of GIS is its capacity to combine layers of data into a single map where the user can on or off layers according to requirements. A GIS user can generate new information from existing ones by using different combinations of layers. Query is another advancement of GIS. A query is similar to a “search” for a web page. It retrieves relevant data from the database, which relates to the required data and provides it as a new theme. Not only the visualization of data, but also the characteristic to combine different thematic layers makes GIS a powerful tool. Some examples of thematic
layers include roads, rivers, forest, and buildings. GIS can also combine these layers to produce new themes, which answer questions like “How many buildings are at a 50 m distance from Lake Site?”
Applications of GIS
Geographic information system is a technology used by a variety of industries and fields for simulation of complex patterns, visualization and analysis of real-world situations, management, and decision support. GIS plays a decisive role in transport, forestry, natural resource management, business, tourism, public safety, health domain, and education. It helps stakeholders by collecting, processing, manipulating, and displaying data in different formats as per requirement. These computer-based information systems assist experts in different fields to make an appropriate decision by analysing spatial data in the desired format. A detailed discussion on the applications of GIS in different fields is given in Chapter 5. Table 1.3 gives some uses of GIS in various industries.
Table 1.3 Uses of GIS in different industries
Industry Use of GIS
Forestry Inventory and management of resources
Police Crime mapping to target resources
Epidemiology To link clusters of diseases to sources
Transport Monitoring routes
Utilities Managing pipe networks
Oil Monitoring ships and managing pipelines
Central and local government Evidence for funding and policy
Health Planning services and health impact
assessments
Environment agencies Identifying areas of risk (for example, floods) Emergency departments (for
example, ambulance) Planning quicker routes
Retail Store location
Marketing Locating target customers
Military Troop movement
Mobile phone companies Locating masts
Land registry Recording and managing land and property Estate agents Locating properties that match certain criteria
Insurance Identifying risks
Agriculture Analysing crop yields
CARTOGRAPHY AND GIS
Cartography is the art and science of map making. Map is a set of points, lines, and areas all defined both by position with reference to a coordinate system and by their non-spatial attributes. The term comes from two Greek words—chartis, meaning “map”, and graphos, meaning
“to draw” or “write”. Basic cartography covers the following two data components.
1. Location data: These indicate where the area being depicted is located.
2. Attribution data: These show bodies of water, mountains, valleys, hills, and other geographical features of interest.
Cartography relies heavily on mathematics to represent the earth and on science to help describe and understand geological features (Figure 1.6). A map of the world reflects an immense mathematical and artistic challenge—that of translating the three-dimensional globe to a two- dimensional surface. Cartography is an ancient discipline that dates from the prehistoric depiction of hunting and fishing territories. Evidence of map making suggests that the map evolved independently in many different parts of the earth. The people of Marshall Islands made stick charts for navigation. Pre-Columbian maps in Mexico used footprints to represent roads. The oldest known maps have been preserved on Babylonian clay tablets since 2300 BC.
Traditional cartography was a difficult and tedious task due to some fundamental challenges. It was difficult to accurately represent terrains
Figure 1.6 Cartography
Source <http://grass.mirror.ac.za/screenshots/images/ecoZones_Ponti.png>
Figure 1.7 Digital cartography
Source <http://grass.mirror.ac.za/grass60/screenshots/images/general1.jpg>
with different heights and slopes in a two-dimensional flat surface. It was also difficult to determine the spatial information not relevant to the map’s purpose. It was hard to design the schema of the map and select the traits of elements to be referenced in the map.
The discovery of the New World led to the need for new techniques in cartography, particularly for the systematic representation on a flat surface of the features of a curved surface. GIS emerged in the 1970s and 1980s. It represents a major shift in the cartography paradigm. In traditional (paper) cartography, the map was both the database and the display of geographic information. In GIS, the database, analysis, and display are physically and conceptually separate aspects of handling geographic data. The advancements in mapping offered by modern computers are incredible. There is practically an unlimited number of available colours, very high resolution of graphical displays, software supporting generation of realistic three-dimensional scenes, dynamic views (animation), user interaction with displays, dynamic display transformation, and interlinking of multiple views (Figure 1.7). All this is available even on standard personal computers, while more complex and powerful equipment can further enhance some of these features.
Modern cartography largely involves the use of aerial photographs as a base for any desired map or chart. The procedures for translating these photographic data into maps are governed by the principles of
photogrammetry and yield a degree of accuracy previously unattainable.
Satellite photography has made possible the mapping of the features of the moon and of several planets and their satellites.
GIS DATABASE
A GIS database is a collection of geographic data sets, features class (a collection of features or a table of rows where each row has a geographic column), raster data, and attribute tables. It is used primarily to store, query, and manipulate GIS data. Like a map, GIS data commonly have two data components.
1. Spatial component: The spatial component of the data describes the unique geographical location for objects or phenomena; for example, the location of a lake. The geographical location must be specified in a unique way. A coordinate system is used to specify the position in an absolute way and on the earth’s surface is known as a geo- reference system. Examples of geo-reference systems are Universal Transverse Mercator and the latitude–longitude system. For small areas, the simplest coordinate system is the regular square grid.
Internationally, there are many different coordinate systems. Location information on a map is provided with the help of points, lines, and polygons.
2. Attribute component: Attributes refer to the properties of spatial entities. They are often referred to as non-spatial data since they do not, in themselves, represent location information. This type of data describes the characteristics of the spatial features; for example, the quality of water, the quantity of water, area, and depth of a lake.
Characteristics can be quantitative or qualitative in nature or both.
Attribute data are often referred to as tabular data.
Geo-databases store geometry, a spatial reference system, attributes, and behavioural rules for data. Various types of geographic data sets can be collected within a geo-database, including feature classes, attribute tables, raster data sets, network data sets, topologies, and many others. Geo-databases can be stored in IBM DB2, IBM Informix, Oracle, Microsoft Access, Microsoft SQL Server, and PostgreSQL relational database management systems, or in a system of files, such as a file geo- database. A GIS database maintains the following basic characteristics.
• It stores a rich collection of spatial data in a centralized location.
• It applies sophisticated rules and relationships to data.
• It defines advanced geospatial relational models (for example, topologies, networks).
• It maintains integrity of spatial data with a consistent, accurate database.
• It works within a multi-user access and editing environment.
• It integrates spatial data with other information technology databases.
• It supports custom features and behaviour.
• It is expensive to create and update, often resulting in outdated data sets, which may have been updated years ago.
• Internal data stored in a GIS database must possess completeness, logical consistency, temporal consistency, thematic consistency, and positional consistency (Table 1.4).
Table 1.4 Desired properties of internal data stored in GIS
Completeness Presence and absence of features, their attributes and relationships
Logical consistency Degree of adherence to logical rules of data structure, attribution, and relationships
Positional consistency Accuracy of the position of features
Temporal consistency Accuracy of temporal attributes and temporal relationships of features
Thematic consistency Accuracy of quantitative and non-quantitative attributes
GIS DATA TYPE
Geographic information system stores spatial information about the real world as a collection of thematic layers, where layers are linked together with associated geography. This simple but extremely powerful and versatile concept has been proven invaluable for solving many real- world problems. The ability of GIS to handle and process geographically referenced data distinguishes it from other information systems. A data type is the classification method that distinguishes different types of data used by computer systems. Human beings can easily recognize different types of data and use special symbols such as $ and % to represent data. Similarly, computer systems use special internal codes to keep track of the different types of data they process. Geographically referenced data describe both the location and characteristics of spatial features on the earth’s surface. GIS supports two basic spatial data types:
raster and vector (Figure 1.8).
Raster Data Type
Raster data type represents spatial (geographical) information by dividing it into regularly spaced and quantized cells. A cell is a small grid and is known as a pixel (picture element). Raster cells are organized as a matrix of rows and columns. Each pixel has two associated values:
(1) pixel location represented as a row/column number, and (2) cell value to represent attribute/property of interest.
Vector Data Type
The following are examples of vector data types.
• Points represent discrete points on the earth’s surface.
• Lines represent linear features such as rivers and roads. Each line has several coordinate points, which maintain its shape.
• Polygons represent bonded areas such as waterbodies and political boundaries.
GIS DATA MODELS
A model is a simplified representation of a phenomenon or a system. GIS modelling involves the symbolic representation of the location properties (where), as well as the thematic (what) and temporal (when) attributes describing the characteristics and conditions of space and time. A GIS model attempts to emulate processes of the real world at some point of time or for a limited time period. It allows the testing of a hypothesis with different data sets related to a geographical scenario. A model can
Figure 1.8 (a) Raster data and (b) Vector data
be embedded into a GIS application for easier reproduction of data. A GIS model can be exported as a flow chart or modelling data structure. There are different types of GIS models with some fundamental characteristics such as scale, extent, purpose, approach, technique, association, and aggregation. A large number and variety of data models are used in GIS, some of which are as follows (John 1997).1
• Vector data models Spaghetti data model Topological data model
• Raster data models (more specifically, tessellation model)
• Surface models
Triangular irregular network (TIN) model Digital elevation model (DEM)
• Conceptual models
Entity-relationship model
Enhanced entity-relationship model
• Network models
• Relational models
• Object-oriented models
• Hierarchical models
• Semantic data models
• Conceptual models
Vector Data Models
Vector data models use points, lines, and polygons to represent any geographical location. In vector representation, the boundaries are defined as a series of points and each point is uniquely mapped to the x–y coordinates of a geo-reference coordinate system. The non- spatial attributes of these locations are stored in conventional database management systems. Two very common types of vector data models are spaghetti data model and topological data model.
Spaghetti data model
A vector-based data model where each element on the map becomes a logical record in a digital file and is defined as a string with x–y coordinates is called a spaghetti data model. This is a simplest data model where every object is stored independently. Objects in the
1 Details available at <www.gisca.adelaide.edu.au/kea/gisrs/courses/postgrad/ introgis/
chapter6/chi6.html>
spaghetti model are stored as a set of two elements—name of the object and the x–y coordinate value of the object location. A spaghetti model is illustrated in Figure 1.9.
Some properties of the spaghetti model are as follows.
• A common boundary between two polygons is recorded twice. Hence, redundant data exist in a spaghetti model.
• Lines are encoded as strings of x–y coordinates, while polygons are encoded as curved loops.
• No spatial relationships are stored in the spaghetti model.
Topological data model
A vector-based data model that encodes spatial relationship of points, lines, and polygons and defines how they share geometry represents a topological data model. A topological model introduces two new elements of discrete mathematics—node and edge. A node is a uniquely defined point that joins several arcs. An edge is an arc that has a defined starting node and ending node. This model stores geometry as a series of nodes and arcs. A shared geometry, such as a common boundary between two polygons, is stored only once in a topological model; hence, redundancy is eliminated. A topological data model is illustrated in Figure 1.10.
Some properties of a topological model are as follows.
• The node is the basic entity in this kind of model. It is the point where several arcs meet.
• An arc is a series of nodes having a starting node and an ending node.
• A point is a single x–y coordinate and is considered a polygon with no area.
Figure 1.9 A spaghetti data model
• A polygon is a closed loop of arcs that represents the boundary of the polygon.
• Every object of the model is composed of a less complex structure.
• Topological models provide opportunities for geometric analysis of location without actual access to the location.
Raster Data Models
Raster data models represent geographical location as a series of interconnected cells where each cell is limited and represents an equal area of earth surface. Raster data models use raster data type to encode spatial data of the area of interest. The matrix (row–column structure) of cells is called a grid. In raster data models, the accuracy of data depends on the cell size, since the cell is the smallest unit that contains spatial information of a location.
Each cell of a raster data model contains an associated data value. For a 1 bit raster file, there are only two possible values for the cell, 0 or 1, while for an 8 bit raster file, there are 256 possible values for each pixel.
Figure 1.11 shows a 4 bit raster file. A data value can represent a colour or grey value, depth or height, and measurements or any other thematic value. The area covered by each pixel is known as spatial resolution. An important property of a raster model is that all 0-dimensional (points) and 1-dimensional (lines) features will be located towards the centre of the cell. There are several raster-based models, and the common ones include eGrid ESRI files, digital orthophotos, and satellite imagery.
Some properties of a raster data model are as follows.
• Often used for biological and physical subsystems of the geosphere, such as temperature, elevation, and vegetation cover.
• Focuses on analysis and modelling of images.
Figure 1.10 A topological data model
• Lines and points move towards the centre of cells in a raster model.
• The spatial position of each cell in a raster model can easily be calculated by defining the origin of the raster and the spatial resolution (cell size) of each cell.
• Tiff, jpeg, and bmp are various data formats based on the raster data model.
• Landsat TM satellite imagery data are raster data with a spatial resolution of approximately 30 m on one side.
Surface Models
Triangulated irregular network model
Triangulated irregular network model uses contiguous, non-overlapping triangles to represent a three-dimensional surface (length, width, and height). A geographical region can be divided into both regular (raster) and irregular non-overlapping polygons for modelling and analysis. A TIN model allows surface models to be generated efficiently to analyse and display terrain and other types of surfaces. The elevation value of a specific point on the earth’s surface is modelled as the vertex of a triangle, whereas arcs represent the estimation of elevations between two vertices (two points on the earth’s surface). To maintain the accuracy in drawing the triangles, that is, to maintain the accuracy in elevation modelling, the Delaunay construction rule is exercised. According to the Delaunay construction rule, “three points form a Delaunay triangulation if and only if (iff) a circle which passes through all three points contains no other points in the set.” This rule can be devised to divide areas of similar slope into irregular triangles. For example, a rectangular region can be divided into two rectangles by joining the north-east and south-west corners of the rectangle. By placing a point in the centroid of each triangle, six more non-overlapping triangles can be constructed
Figure 1.11 A 4 bit raster file
(Figure 1.12). This process proceeds until a predefined threshold value is generated.
Nodes are the elementary building blocks of the TIN data. They are connected to their nearest neighbours by edges, according to a set of rules. The user is not responsible for selecting the nodes; all the nodes are added according to a set of rules. The TIN creates triangles from a set of points called mass points, which always become nodes. Mass points can be located anywhere, but the accuracy of the model depends on the proper selection of mass points. Every triangle is assigned a unique identifier defined by three nodes and its two or three neighbouring triangles.
Some properties of a TIN model are as follows.
• The model was developed in the early 1970s as a simple way to build a surface from a set of irregularly spaced points.
• It is a vector-based model (in the form of lines, points, and polygons) dividing a surface into polygons having the attributes of slope, aspect, and area, with three vertices having elevation attributes and three edges with slope and direction attributes.
• A fewer number of points is required to model the surface; hence, it has a smaller file size.
Figure 1.12 Triangulated irregular network model
• It is an irregular model because vertices are scattered in ad hoc fashion.
• It is simple and economic.
• A TIN can be created using contours [a line through all contiguous points with equal height (or other values)] and breaking lines (linear features that define and control the surface behaviour in terms of smoothness and continuity).
Digital elevation model
A digital elevation model is a sampled array of spot heights at regular intervals in any surface. The height of the highest point in a given area is expressed in feet or metres above sea level, as marked on topographical charts. In a DEM, digital information about surface elevations is presented in raster format. Each pixel value in the grid structure represents the spot height on the surface. Surfaces like the earth’s surface are continuous phenomena; hence, they require an infinite number of points to be represented with a finite data set. Specific computer software interprets the DEMs by converting them into a three-dimensional depiction of the surface (Figure 1.13). A DEM is the most common and simplest form of topography. It is called a digital terrain model when it represents the earth’s surface without objects on it (the bare earth’s surface). It is
Figure 1.13 A DEM diagram
Source <http://grass.osgeo.org/uploads/images/Gallery/3D/landsat_RGB_nviz_trento.png>
called a digital surface model when it represents heights of landscape features such as trees and buildings. Elevation and height are technically different. Elevation is the height above a given level, especially that of the sea, whereas height is the measurement from base to top.
Some properties of DEM are as follows.
• The accuracy of a DEM is measured by resolution and height.
• A DEM contains only the specific elevation values at specific grid point locations.
• Elevation contours are specified in DEM representation.
• A DEM is specifically used for many geo-analysis processes such as landslide study and topographical feature extraction.
• A DEM is widely popular for terrain analysis due to its simplicity and extensive software support.
• Resolution (distance between two grids) is the most critical parameter to be decided in a DEM model.
• A DEM is used to find features on the terrain, such as drainage basins and watersheds, drainage networks and channels, peaks and pits, and other landforms.
Network Models
Network models are graphs consisting of arcs that represent linear flows and nodes, which represent the interconnection between the arcs.
Nodes can be junctions, and edges can be roads in a network model (Figure 1.14). A network can also be considered a system of vertices and edges, mathematically defined as a graph G = (N, E), where N is the number of nodes and E is the number of edges in the network. Networks are used to store connectivity of source features. Because of its node–arc structure, network models preserve topology and are widely used for allocation, path finding, and tracing. The geometry or topology of a network model should be close to the real-world scenario.
Network models find a connected path through a network; they then analyse and manage the parts and assets associated with it. Arcs in a network model can be broadly classified into two types:
• Directed links are straight lines connected by two nodes (Figure 1.15a).
• Directed chains are topologies with intermediate shape points between two nodes (Figure 1.15b).
Two important aspects of a network model are network topology and feature connectivity. Network models are widely used to analyse
vehicle traffic over transportation systems, load analysis over an electric network, or pollution tracking over a river.
Relational Models
A model that organizes data into a tabular format is called a relational data model. Relational data models store data in tables. Each table has a unique name and identity. The table has two aspects—a set of columns representing field names and rows containing information.
Rows are known as tuple, and the order in which they occur in a table is immaterial. No two rows can represent the same values for all columns in the table. In a GIS, each row is usually linked to a separate spatial feature. Accordingly, each row would consist of several columns, each column containing a specific value for that geographic feature. Data are often stored in several tables (Figure 1.16). Tables can be joined or referenced to each other by common columns (relational fields). The possibility of joint operations in relational data models is what makes relational data models commonly used in GIS.
The relational database model is the most widely accepted model for managing non-spatial attributional data. It has emerged as the
Figure 1.14 A network model
Source <http://grass.osgeo.org/screenshots/vector/>
Figure 1.15 Arcs in network chains: (a) directed link and (b) directed chain
dominant commercial data management tool in GIS implementation and application. A relational data model has the following properties.
• It is simple to organize information into tables and model it.
• Data can be manipulated in an ad hoc manner by joining tables.
• It reduces data redundancy by a proper storage of data tables.
• There is no need to take into account the internal organization of data.
Object-oriented Models
Object-oriented models store data into objects. These objects can be accessed only by methods specified by its class (group of object with similar attributes and methods) (Figure 1.17). An object-oriented model incorporates the following fundamental concepts.
• Any real-world entity can be modelled as an object. Every object has a unique identification.
• Every object possesses a state (values of different variables at an instance of time) and behaviour (set of methods that operate on the state of the object). The state and methods of an object can be accessed by another object only by passing a message.
• Class is a group of all objects that share the same attributes and methods.
Figure 1.16 Relational database
Source <http://grass.osgeo.org/screenshots/vector/>
• Each class has the super class from which a class can inherit objects, methods, or both.
• The essence of an object-oriented model lies in its properties, which are explained as follows.
Encapsulation: Encapsulation is an attribute of object design by virtue of which all the data related to an object are contained by and hidden in the object. It can only be accessed by member of the object’s class.
Polymorphism: Polymorphism is the occurrence of something in many forms. It is a characteristic that allows an object to have more than one form.
Inheritance: Inheritance is an attribute that allows a super class to transfer its state and attributes to its children.
In GIS, object-oriented modelling not only allows the data to be held as an object (for example, an element on a map) but also allows these objects to be operated on by its methods and establishes relationships between these objects through message transfer. In this approach, querying is very natural, as features can be bundled together with attributes if the application requires. Object-oriented modelling thus holds many operational benefits with respect to geographic data processing.
Figure 1.17 Object data model
Hierarchical Models
Hierarchical models present data as family tree such that each record has only one member. Figure 1.18 presents a hierarchical data model representing an animal family. A classical data model sets layers of data set, and subsets are organized in a parent–child structure.
Hierarchical models are similar to the classic file structure of data in computers. These are the oldest type of data models. They support only a one-to-many relationship among data items. Actual geographical phenomena may not allow the number of parents to be limited; thus this model has very limited scope in GIS applications.
Semantic Data Models
Semantic data models (SDMs) represent data in logical structures.
They focus on providing the meaning of data along with attributes and interrelationships with other data. In semantic data models, an entity represents an aspect or a phenomenon of the real world. It supports dynamic schema evolution to capture new or evolving types of semantic information. Semantic models are widely used in natural language processing to define the semantic context of entities (words) used at any instance. They follow an arc–node structure where a node represents the basic entities and an arc represents the relationship between these entities (Figure 1.19). SDMs incorporate two types of relationship between entities—“is-a” (membership) relationship and
“has-a”(inheritance) relationship.
Conceptual Models
Conceptual models are a type of abstraction that uses logical concepts and hides the details of implementation and data storage. Conceptual models are the most abstract form of data. Detailed information, such
Figure 1.18 A hierarchical model
as data types, is omitted from conceptual data models. There are two standard ways in which spatial information is modelled conceptually—
object-based and field-based models.
Object-based models
Object-based models represent information as discrete geo-referenced entities. Each entity has a coordinate pair of x, y associated with it, defining its location in the real world. Because it is focused on objects, the implementation of this conceptual model will yield data models and structures that are focused on objects.
Field-based models
Field-based models represent information as collections of spatial relationships, where each relationship is formalized as a mathematical function from a spatial framework. The spatial framework indicates that the model will divide an area into a finite tessellation of spatial units.
TOPOLOGY AND GIS
Topology is the framework to model the relationship among vector features (point, line, and polygon) and determines the way these features share the geometry with their neighbouring vector objects. It is a branch
Figure 1.19 Semantic database
of mathematics that studies continuity and connectivity (Figure 1.20).
Topology is the study of qualitative properties of objects that are invariant under transformation. In GIS, topologies are important to preserve spatial properties when data pass through some transformations.
Some basic topological relationships that are not affected by the coordinate system are as follows.
• Connectivity: Connectivity represents the arc–node architecture of objects. The arc represents the spatial relation between the starting node and the end node in context of connectivity.
• Contiguity: Contiguity is the identification of adjacent polygons by recording the left and right polygon of each arc. A polygon is a closed area generated by a chain of arcs having the same start and end node. Polygons sharing common arcs are regarded as adjacent or contiguous polygons. Thus the left and right sides of each polygon can be defined. This left and right polygon information is stored explicitly within the attribute information of the topological data model. The “universe polygon” is an essential component of polygon topology that represents the external area located outside the study area.
• Area definition: A closed area is defined by a boundary. The concept of area definition is that an arc that surrounds an area defines a polygon. Each arc is stored only once, and the boundaries of adjacent polygons do not overlap; hence data redundancy is eliminated.
Figure 1.20 Graphical representation of topology
Topologies are effectively used to model spatial relationships. Since input data do not contain topological information, GIS software has to build topologies. Topologies are used to detect and correct digitizing errors. They are essential for network analysis. Topologies are also important because many GIS applications do not require coordinates, only topologies.
EXERCISES
Question 1 What is GIS? Define the components of GIS.
[Hint: A geographic information system (GIS) is a set of computerized tools for collecting, storing, retrieving, transforming, and displaying spatial data. The potential of GIS is explained by its unique ability to take up data from widely divergent sources, analyse trends over time, and evaluate spatial relationships. GIS is made up of five key components: hardware, software, data, people, and method.]
Question 2 What is information system? What is the difference between information system and expert system?
[Hint: An expert system is a program that uses available information and inferences to suggest solutions to problems in a particular discipline.
It has an inference mechanism, for example, the ability to infer and the ability to derive new information from existing ones.]
Question 3 Differentiate between spatial and non-spatial data. Explain how spatial data play a vital role in resource management and decision- making in various fields/industries.
[Hint: Spatial data are information about the locations and shapes of geographic features and the relationships between them, usually stored as coordinates and topology (ESRI definition). They are geo- referenced; for example, they have a location component. The location of any object may be relative (for example, the height of a tree with respect to another tree) or absolute (for example, the uniquely defined pin code for an area). Spatial data have four important aspects for processing geographical information with the help of an information system such as GIS—location, direction, distance, and space.
Data about attributes of geographical features that are not geo- referenced are called non-spatial data. Non-spatial data are stored as tables in relational databases. Tabular and attribute data are non-spatial but can be linked to the location.]
REFERENCES
O’Brien, J. 1993. Management Information System: a managerial end user perspective, 2nd ed. Homewood, IL: Irwin
Schaefer, F. K. 1953. Exceptionalism in geography: a methodological examination. Annals of the Association of American Geographers 43:
226–249
Raster and Vector Data
2
INTRODUCTION
Data are the essence of geographic information system (GIS). Once geospatial data are captured, the question that arises is how to digitally represent the data? Digital representation of geographical information (see Chapter 1) is time saving and economic, and it allows easy access to geographical information for further geospatial operations and analyses.
Geographic representation is the technique of representing some part of the earth’s surface or near surface. Geographical data that need to be represented are built up either of atomic elements or details of geographical phenomena. Hence, to represent geographical inf