GIS DATABASES
GIS DATABASES
an overview
2
Contents
Contents
– the basics of data storage – overview of databases
• the database approach • types of databases
• databases in GIS
– design considerations
– development of an ARC/INFO database
– the basics of data storage – overview of databases
• the database approach • types of databases
• databases in GIS
– design considerations
3
Conceptual, logical and physical ...
Conceptual, logical and physical ...
4
A storage hierarchy ...
A storage hierarchy ...
– files/tables
• records
• fields(types …)
– databases
– information systems
– decision support systems (DSS)
– approaches to storage
• application/file based • databases
– files/tables
• records
• fields(types …)
– databases
– information systems
– decision support systems (DSS)
– approaches to storage
• application/file based • databases
increasin g
5
Application based approach
Application based approach
Permits
Permits
Tax/Rates Assessment
Tax/Rates
Assessment Assessment Data
Permit Data
Sewer Data
Sewer
Maintenance
Sewer
Maintenance
6
Database approach
Database approach
Permits Permits Tax/Rates Assessment Tax/RatesAssessment Assessment DataAssessment Data
Permit Data Permit Data Sewer Data Sewer Data Sewer Maintenance Sewer Maintenance D a ta b a s e M a n a g e m e n t S y s te m
7
Database … a definition
Database … a definition
• A collection of interrelated
data stored
together with controlled redundancy to
serve one or more applications in an
optimal fashion.
• A common and controlled approach is used
in adding new data and modifying and
retrieving existing data within the data base
• A collection of interrelated data stored
together with controlled redundancy to
serve one or more applications in an
optimal fashion.
• A common and controlled approach is used
in adding new data and modifying and
8
Databases… objectives/advantages
Databases… objectives/advantages
– centralised data storage and management … global view of data … data dictionary
• standardisation of all aspects of data management • reduced duplication
• multiple access / retrieval flexibility
• integrity constraints … validation enforced • ...
– data base management system (DBMS)
– centralised data storage and management … global view of data … data dictionary
• standardisation of all aspects of data management • reduced duplication
• multiple access / retrieval flexibility
• integrity constraints … validation enforced • ...
9
Database/s… data dictionary
Database/s… data dictionary
– the most critical (?) element of a database – data about data… metadata
– essential for system development – uses include
• design - entities and data relationships • data capture - entry/validation
• operations - program documentation
• maintenance (impact assessment of proposed changes , est. of effort, cost …)
– the most critical (?) element of a database – data about data… metadata
– essential for system development – uses include
• design - entities and data relationships • data capture - entry/validation
• operations - program documentation
10
Data dictionary…
types of information (general)
Data dictionary…
GIS Metadata
12
DBMS … key modules
DBMS … key modules
– a data description/definition module
• defines/creates/restructures • enforces rules
– a query module
• retrieval for queries, ad-hoc queries, simple reports
– a report writing program
– a high level language interface – ...
– a data description/definition module • defines/creates/restructures
• enforces rules – a query module
• retrieval for queries, ad-hoc queries, simple reports – a report writing program
13
Database… stages of development
Database… stages of development
– information systems plan for organisation – system specification … user needs analysis – conceptual design … data modelling
• hardware and software independent
– physical design … database design – database implementation
– monitoring/audit
– information systems plan for organisation – system specification … user needs analysis – conceptual design … data modelling
• hardware and software independent – physical design … database design – database implementation
14
Database… stages of development
15
Organisational strategy and IT
Land Information System (LIS) (i)
Organisational strategy and IT
Land Information System (LIS) (i)
– Problems/issues:
• rationalisation of land related information in government agencies
• the removal/reduction of duplication
• introduction of economies in data capture, maintenance and storage
• better (and wider) access to data
– Problems/issues:
• rationalisation of land related information in government agencies
• the removal/reduction of duplication
• introduction of economies in data capture, maintenance and storage
• better (and wider) access to data
16
Organisational strategy and IT
Land Information System (LIS) (ii)
Organisational strategy and IT
Land Information System (LIS) (ii)
– Solutions:
• better data distribution mechanism (data format and location transparent to user)
• knowledge of data distribution built into the data dictionary
• reduction of data duplication • uniform query language (SQL)
• coding and data interchange standardisation ( … SDTS)
– Solutions:
• better data distribution mechanism (data format and location transparent to user)
• knowledge of data distribution built into the data dictionary
• reduction of data duplication • uniform query language (SQL)
18
Database types -
a history
19
Database types - hierarchical (i)
Database types - hierarchical (i)
– lends itself to GIS use as data are often
hierarchical in structure e.g. municipality x province x country
– records divided into logically related fields … connected in a tree-like arrangement
– master field in each group of records … pointers … updates require pointers to be modified
– fast preset queries … ad hoc queries difficult or impossible
– lends itself to GIS use as data are often
hierarchical in structure e.g. municipality x province x country
– records divided into logically related fields … connected in a tree-like arrangement
– master field in each group of records … pointers … updates require pointers to be modified
20
Database types
- hierarchical (ii)
Database types
- hierarchical (ii)
COUNTRY (USA)
States
Counties
Boundaries
Hierarchical Structure for a
Cadastral database
Hierarchical Structure for a
Cadastral database
23
Database types - network (i)
Database types - network (i)
– similar to hierarchical but have multiple connections between files to accommodate many to many (M:M) relationships
– access to a particular file without searching the entire hierarchy above that file
– linked records … quick preset searches … large overhead in pointer management
– modification after creation difficult
– similar to hierarchical but have multiple connections between files to accommodate many to many (M:M) relationships
– access to a particular file without searching the entire hierarchy above that file
– linked records … quick preset searches … large overhead in pointer management
24
25
26
Database types - relational (i)
Database types - relational (i)
– model developed from mathematics
– records and fields in a 2-dimensional table
– no pointers etc … any field can be used to link one table to another
– normalisation … redundancy/stable structure – ad hoc queries SQL… modifications easy – not very efficient for GIS …SQL3
– model developed from mathematics
– records and fields in a 2-dimensional table
– no pointers etc … any field can be used to link one table to another
27
Database types - relational (i)
28
Hierarchical structure
Network structure
30
Centralised vs distributed
Centralised vs distributed
– a database does not necessarily mean a
centralised arrangement i.e. all data in one physical place
– a database does not necessarily mean a
31
GIS and distributed
databases ...
– trend towards open systems ...
• special hardware and software can be used widely … specific applications optimised
• system/network communications is easier
– modular implementation from an overall design … incremental change
32
Approaches to GIS system design
Approaches to GIS system design
– develop a proprietary system
– develop a hybrid system: proprietary graphics + commercial DBMS for attribute data (e.g.
ARC/INFO)
– use commercial DBMS and develop spatial functions and graphics display used in
geographic analysis (e.g. siroDBMS, System9) – develop a spatial DBMS from scratch
– develop a proprietary system
– develop a hybrid system: proprietary graphics + commercial DBMS for attribute data (e.g.
ARC/INFO)
– use commercial DBMS and develop spatial functions and graphics display used in
33
Software linkages (1) Separate Spatial and attribute data
35
GIS databases … some problems (i)
GIS databases … some problems (i)
– centralised risk
• centralisation demands better quality control other higher potential for disaster
– cost
• large DBMSs are expensive to design, implement and operate • piecemeal design is difficult
– complexity
• need to keep track of complex hardware and software
• need to keep track of graphical as well as attribute data and the links
– centralised risk
• centralisation demands better quality control other higher potential for disaster
– cost
• large DBMSs are expensive to design, implement and operate • piecemeal design is difficult
– complexity
• need to keep track of complex hardware and software
36
GIS databases … some problems (ii)
GIS databases … some problems (ii)
38
39
Objectives of design
Objectives of design
– a good design results in a database which:
• contains necessary data but no redundant data
• organises data so that different users access the same data
• accommodates different views of the data
• distinguishes applications which maintain data from those that use it
• appropriately represents, codes and organises geographic features
– a good design results in a database which: • contains necessary data but no redundant data
• organises data so that different users access the same data
• accommodates different views of the data
• distinguishes applications which maintain data from those that use it
40
Design methodology (for ARC/INFO)
Design methodology (for ARC/INFO)
– conceptual model
• model the users’ view
• define entities and their relationships
– logical model
• identify representation of entities • match to ARC/INFO data model • organise into geographic data sets
– physical model
– conceptual model
• model the users’ view
• define entities and their relationships – logical model
41
Design methodology (for ARC/INFO)
Design methodology (for ARC/INFO)
– 1. Model the users’ view
– 2. Define entities and their relationships – 3. Identify representation of entities
– 4. Match to ARC/INFO data model – 5. Organise into geographic data sets –
– 1. Model the users’ view
– 2. Define entities and their relationships – 3. Identify representation of entities
42
1. Model the users’ view
1. Model the users’ view
– create a model of work performed by users for which ‘location’ is a factor
• identify organisational functions
• identify the data which supports the functions
– organise data into sets of geographic features
• data function matrix
– high level classification of data
– interdependence of data and function
– difference between users and creators of data
– create a model of work performed by users for which ‘location’ is a factor
• identify organisational functions
• identify the data which supports the functions
– organise data into sets of geographic features • data function matrix
– high level classification of data
– interdependence of data and function
43
44
Data function matrix …an example
45
2. Define entities and their relationships
2. Define entities and their relationships
– entities: distinguishable objects which have a common set of properties
• identify and describe entities
• identify and describe the relationship among these entities
• document the process
– diagrams
– data dictionary
• Normalise the data
– entities: distinguishable objects which have a common set of properties
• identify and describe entities
• identify and describe the relationship among these entities
• document the process
– diagrams
– data dictionary
46
Entity/relationship definition
47
Diagramming … entities
48
Normalisation
Normalisation
– First Normal Form (1NF) – Second Normal Form (2NF) – Third Normal Form (3NF)
– First Normal Form (1NF) – Second Normal Form (2NF) – Third Normal Form (3NF)
Underlying entities...
Underlying entities...
53
3. Identify representation of entities
3. Identify representation of entities
– determine the most effective spatial representation for geographic features – consider whether:
• a feature might be represented on a map
• the shape of a feature might be significant in performing geographic analysis
• the feature will have different representations and different map scales
• textual attributes of the feature will be displayed on map products
• ...
– determine the most effective spatial representation for geographic features – consider whether:
• a feature might be represented on a map
• the shape of a feature might be significant in performing geographic analysis
• the feature will have different representations and different map scales
• textual attributes of the feature will be displayed on map products
54
4. Match to ARC/INFO data model
4. Match to ARC/INFO data model
– determine the appropriate ARC/INFO representation for entities
• points, lines, polygons
– ensure complex feature classes are supported
• route comprised of sections which in turn are based on arcs
• a region is composed of polygons
• event is a point or a line which occurs along a route
– others (e.g. GRID, TIN)
– determine the appropriate ARC/INFO representation for entities
• points, lines, polygons
– ensure complex feature classes are supported • route comprised of sections which in turn are based
on arcs
• a region is composed of polygons
55
Matching to ARC/INFO data model
Entity Spatial type
ARC/ INFO
Related to
Coverage Attribu te files
56
5. Organise into geographic data sets
5. Organise into geographic data sets
– to identify and name the geographic data sets that will contain the various entities:
• define the contents of geographic data sets (coverages, grids etc)
• name workspaces, geographic data sets, entities and attributes
• complete entity definitions
• add cartographic text and lookup tables
– to identify and name the geographic data sets that will contain the various entities:
• define the contents of geographic data sets (coverages, grids etc)
• name workspaces, geographic data sets, entities and attributes
• complete entity definitions
57
5(i) Define the content of geographic data sets
5(i) Define the content of geographic data sets
– Data sets supported : coverage, grid, tin, image and drawing
– coverages several entities can be grouped into a single coverage
– DBMS : stored in a separate database management system
– Data sets supported : coverage, grid, tin, image and drawing
– coverages several entities can be grouped into a single coverage
58
5 (ii) Geographic datasets, entities and attributes
5 (ii) Geographic datasets, entities and attributes
– coverage definitions
• high level summary of the data physically stored in the database
• required for defining the coverage structure
– file naming conventions in ARC/INFO
– coverage definitions
• high level summary of the data physically stored in the database
59
5 (iii) Complete entity definitions
5 (iii) Complete entity definitions
– background information: coverage name, data source, agency, number of records etc.
– attribute definition
• attribute name, type, field width • validation rules/ permitted values
– background information: coverage name, data source, agency, number of records etc.
– attribute definition
60
5 (iv) Cartographic text & code tables
5 (iv) Cartographic text & code tables
– annotation (text, placing rules etc) – look up tables
• pre defined set of values • description/ labels
• means of creating displays based on attribute values
– annotation (text, placing rules etc) – look up tables
• pre defined set of values • description/ labels
61
Robinson (Ch 14): Scale and GIS databases
Robinson (Ch 14): Scale and GIS databases
– (past) map’s scale greatly influenced map content and data resolution
– GIS data are ‘scaleless’ … scale is still a critical factor with digital databases - because of the ways in which we create digital databases
– scale and resolution (Tab 14.1)
– (past) map’s scale greatly influenced map content and data resolution
– GIS data are ‘scaleless’ … scale is still a critical factor with digital databases - because of the ways in which we create digital databases
62
Robinson (Ch 14): Scale and resolution issues
Robinson (Ch 14): Scale and resolution issues
– symbolisation and display problems – handling databases of different scales
• join problems (e.g. urban rural) • merge problems (different themes) • scale levels
– in general
– large scale data (AM/FM etc.)
– symbolisation and display problems – handling databases of different scales
• join problems (e.g. urban rural) • merge problems (different themes) • scale levels
– in general
63
Robinson (Ch 15): Managing large GIS
Robinson (Ch 15): Managing large GIS
– Data organisation
• partitioning • spatial indexes • metadata
– data compression
• run length encoding (RLE) • quadtree encoding
• others ...
– Data organisation • partitioning
• spatial indexes • metadata
– data compression
• run length encoding (RLE) • quadtree encoding