Directory UMM :Data Elmu:jurnal:P:Photogrametry & Remotesensing:Vol55.Issue3.Sept2000:

(1)

www.elsevier.nlrlocaterisprsjprs

Location similarity of regions

q

Stephan Winter

Department of Geoinformation, Technical UniÕersity Vienna, Gusshausstr. 27-29, A-1040 Vienna, Austria

Received 25 February 1999; accepted 4 March 2000

Abstract

This article gives a systematic investigation of location-based similarity between regular regions. Starting from reasonable conditions for such measures, it is shown that there is only a finite number of location properties to be compared. The complete set of combinations is presented, and their behaviour and interpretability are discussed.

Similarity measures are needed for all kinds of matching problems, including merging spatial data sets, change detection, and generalization. However, the measures are empirical measures. Therefore, measures found in literature seem to be chosen at random. With this synopsis, I show the differences of behaviour for all available choices.q_{2000 Elsevier Science}

Keywords: similarity; topological relations; quality assessment; distance

1. Introduction

1.1. MotiÕation

Similarity is a concept widely used, referring to

Ž .

space and geographic information systems GIS ; it provides the basis for handling positional uncertainty and imprecision, for matching spatial entities, for merging spatial data sets, for change detection, or for generalization. Since similarity is the basis, there needs to be a measure to make it quantifiable. Addi-tionally, similarity is the central notion for any ab-straction and has been discussed in the categorization

q

This article is a significantly revised and extended version of: Location-based similarity measures of regions. In: Fritsch, D.,

Ž .

Englich, M., Sester M. Eds. : GIS Between Visions and Applica-tions. International Archives of Photogrammetry and Remote

Ž .

Sensing, Vol. 32r4 1998 , pp. 669–676.

Ž .

E-mail address: [email protected] S. Winter .

controversy as an undecidable problem for 2000

Ž .

years Flasch, 1986 . The fundamental question of similarity is to find a common reference frame for measuring: there are so many aspects of physical, linguistic or semantic similarity, that a statement ‘A is similar to B’ contains no information as long as the referred aspects are not specified. For the present investigation, the reference frame is location and location-based similarity.

Spatial entities in databases, in this article as-sumed as regions, are models of real world objects. The comparison of the location of two regions from different data sets is based on the hypothesis that both are modeling the same object. The grade of similarity allows an assessment of that hypothesis. It is most likely that two models are never identical, because of different concepts applied to the real world, of context-dependent levels of detail, of changes in a dynamic world, and of errors in data

Ž .

(2)

capture. Introducing similarity at this point allows to characterize the grade of coincidence by common location as well as the grade of distance by distinct

Ž .

location Tversky, 1977 . As both aspects of similar-ity are not redundant, it is necessary to characterize them in a more precise way. Thus, we talk of

Ž . Ž

similarity common location and dissimilarity

dis-.

tinct location . Obviously, the concept of similarity is broader than a pure distance measure. Further-more, specific similarity measures could yield selec-tive information about different aspects contributing

Ž .

to similarity, like grade of equality or overlap. Similarity measures are empirical measures. Therefore, measures found in literature often seem to be chosen at random. The choice is based on proper-ties of the specific measure without comparison to other possible location-based similarity measures. I will show a synopsis of all choices with significant difference of behaviour, and I will characterize the different measures for their specific behaviour.

1.2. Focus

This paper presents a systematic investigation of similarity measures between two discrete regions

Ž .

from different data sets Fig. 1 . The only aspect considered is the location of regions, which is a function of coordinates in a given geometry. I ex-clude all thematic attributes of regions as well as relations between objects, both relating to their own problems and literature. Finally, I will not treat the matching problem of two regions from different data sources.

It will be shown that the number of location properties to be compared is finite. A complete list of possible combinations will be presented and dis-cussed. Other similarity measures can only be given

Ž

[image:2.595.108.201.520.598.2]

with higher orders of normalization e.g., L -normp

Fig. 1. Given two regions, A and B, from two independent data sets: to what extent are they similar?

.

with p)1 . It can be expected that such measures cannot generate new information because the combi-natorial complexity of possible properties is ex-hausted with the measures given here.

Giving the preconditions that a measure should be symmetric, normalized, and free of dimension, area ratios will be set up. Only some of all possible ratios fulfill these preconditions. These ratios are useful similarity measures. Hence, their behaviour and se-mantical interpretations will be discussed. Also other conditions will be investigated, especially reflexivity and the triangle equation. It will be shown that they may not be postulated for similarity measures.

Different measures characterize different proper-ties or interrelations between position and size of two regions. None of the measures can be a measure of overall similarity. Consequently, at least two of the listed measures are necessary to describe similar-ity as well as dissimilarsimilar-ity. In literature, either one or

Ž .

the other pair of these measures is used. Typically, it is a practical approach which leads to the choice of measures without reference to alternatives. It will be shown at which point and to what extent alternative measures exist.

1.3. Structure

This article starts by investigating similarity as a concept and introduces location as a reference frame

Ž .

for the similarity of regions Section 2 . Then loca-tion-based measures based on intersection sets will

Ž .

be introduced Section 3 . The sizes of the intersec-tion sets are normalized by setting them into ratios. These ratios will be investigated and discussed in Section 4. In simple test situations, the behaviour of

Ž .

these measures will be demonstrated Section 5 .

Ž .

Finally, the conclusion Section 6 will discuss this approach and its results in a wider context.

2. Similarity and location

Similarity is a concept that varies between disci-plines: in mathematics, it describes a type of

trans-Ž .

formation Edgar, 1990 ; in statistics, it means that

Ž .

two similar signals are correlated Jahne, 1995 ; in

¨

cognition, it means that similar things belong to the

Ž .

(3)

similarity is based on the laws of Gestalt theory

ŽMetzger, 1936 ; and, in computer vision, it is re-.

lated to topological and geometric properties, like

Ž

Euler number, area, compactness Haralick and

.

Shapiro, 1992 . Not all of these concepts of

similar-Ž .

ity have a ratio scale Stevens, 1946 , i.e., not all concepts can be measured.

This variety of aspects of physical, linguistic or semantic similarity requires a specification in which way two objects shall be similar. In this paper, the location of two spatial extended objects shall be

Ž .

investigated for similarity. Tversky 1977 postulated that similarity of objects increases with the number of common features, and decreases with the number of distinct features. He proposed a contrast model, which expresses similarity between objects A and B as a weighted difference of the measures of their common and distinct features. Similarity Ais ex-pressed as a function h of three arguments: . . . the features that are common in both A and B; . . . the features that belong to A but not to B; . . . the

Ž .

features that belong to B but not to A.B p. 330 :

Ž . Ž .

s A, B s h AlB, AyB, ByA

1

Ž .

Ž . Ž . Ž .

s ah AlB ybh AyB ygh ByA

Consequently, similarity is more than an inverse of distance or difference. Difference of spatial ex-tended objects causes costs in mapping one onto the other; whereas common features of the objects repre-sent benefits, which have also to be considered in the

Ž .

mapping Vosselman, 1992 . For that reason, Tver-sky’s model provides the basis of the argumentation in this paper: similarity is considered as a combined measure of similar parts and dissimilar parts. With regard to location: similarity is a combination of the

Ž .

parts sharing a location similarity in a narrow sense

Ž .

and the parts that are different dissimilarity . In the following, it is generally assumed that all treated areal objects are existing and not empty. Location is represented simply by the location func-tion:

0 if x , y

Ž

.

fA

f x , y

Ž

.

s

½

Ž .

2

1 if x , y

Ž

.

gA

Ž . 2_Ž _. _Ž _. 2

with x,y gR _{vector representation or x,y} _gZ

Žraster representation , respectively. In the following,.

there will be referred to R2

only, without loss of

generality. All the formulas can be applied also toZ2

replacing integrals by sums.

At this point, it makes sense to distinguish be-tween contexts of similarity of regions. As a binary relation, similarity may concern:

Ž .1 Two different objects in the real world: In this case, similarity concerns shape only with the basic

Ž .

assumption that two physical objects cannot exist at the same location at the same time.

Ž .2 Two different abstractions of the same object: In this case, similarity concerns the two contexts. Location-based measures can be used to specify one type of context by describing indicators.

Ž .3 Two different representations of the same object: In this case, similarity of location concerns identity, or at least part-of relations. Similarity can be used to match regions, to detect differences in data sets, e.g., changes, and so on.

In special circumstances, the third case can be treated as an estimation problem of a shift between

Ž .

two correlated spatial signals. That is common

Ž .

practice in image matching Ackermann, 1984 . But all differences between the two signals, which cannot be described by the shift, violate the estimation model. Hence, they have to be small. In this article, no restriction shall be put on the shape or correlation between the two regions considered. For that reason, statistical matching techniques will not be considered further.

This concept of location shows that matching of data sets is an ill-posed problem requiring empirical approaches and that similarity measures for location are, thus, empirical measures.

3. Location-based measures

In this chapter, we will derive location-based similarity measures with special attention to

com-Ž

pleteness. They will be based on the sizes of

inter-.

section sets with strong interrelation to weighted topological relations.

Location of a region was defined as the space

Ž Ž ..

covered by this region Eq. 2 . Measures based on

Ž .

location count or integrate atomic elements of space; these are points in R2_{, and raster cells in}_Z2_.

(4)

topology, the size of intersection sets between the interior and the exterior of the considered regions comes into focus. Thus, the strong mathematical formulation of topological relations by emptinessr

Ž

nonemptiness of intersection sets Egenhofer and

.

Herring, 1990 is softened to graded, fuzzy or

uncer-Ž .

tain topological relations Winter, 1996 . An

exam-Ž .

ple is the relation equal A,B : if two regions A and B fulfill the strict relation, they share all covered space without distinct space which means they are perfectly similar. All other states of location will be treated as more or less equal corresponding to more or less similar.

The number of possible combinations of such intersection sets is finite. In the following, all loca-tion-based measures will be collected. Then the ra-tios of those measures will be investigated for their use of location-based similarity measures.

3.1. Intersection sets

The intersection sets between the interior and exterior will be investigated to characterize strict topological relations. Then the qualitative relations will be graded by the size of the sets.

Ž Ž ..

The location function Eq. 2 distinguishes two

Ž Ž . .

sets, the interior f x,y s1 and the exterior

Ž Žf x,y.s0 of a region A. The function needs no.

concept of neighborhood. Therefore, open and closed sets cannot be distinguished in the functional repre-sentation. The inverse of function f, fy1_{, yields the}

complement of A, i.e.,!A. Thus, for two regions, A and B, a set of four intersection sets in total can be derived.

[image:4.595.82.229.501.590.2]

Consider Fig. 2. Region A from a data set A and region B from a data set B have an arbitrary posi-tion relative to each other; in the figure, the rectangle

Fig. 2. The intersection sets between two rectangular regions, A and B, form a partition of the plane. The background is assumed to be unlimited.

A is top-left of the rectangle B, and both are over-lapping. Their intersection sets form a partition of the planar space with — generally, at most — four sets:

AlB, Al!B, !AlB, !Al!B.

Ž .

3

All other sets are unions of those intersection sets. For example:

A s

Ž

AlB

.

j

Ž

Al!B

.

B s

Ž

AlB

.

j

Ž

!AlB

.

Ž .

4

AjBs

Ž

AlB

.

j

Ž

!AlB

.

j

Ž

Al!B

.

First of all, the size m of the sets is interesting. An elementary operation size-of is introduced here,

< <

in short, written in mathematical notation by P. This operation can be defined on the location

func-Ž . Ž . 2

tion for A, f x,y , and B, g x,y , inR _as:

< < Ž . Ž .

m1 s AlB s

HH

f x , y g x , y d x d y

x , y

y1

< < Ž . Ž .

m2 s Al!B s

HH

f x , y g x , y d x d y

x , y

y1

< < Ž . Ž .

m3 s !AlB s

HH

f x , y g x , y d x d y

x , y

y1 y1

< < Ž . Ž .

m4 s !Al!B s

HH

f x , y g x , y d x d y

x , y

5

Ž .

With unlimited functions, f and g, inR2_{, the size}

m₄ is always`. Hence, no information is contributed by m₄. Therefore, m₄ can be excluded from further consideration.

Once the sizes mi are known, they can be mapped

Žœ.

to binary measures mi with values 0 0 and 1

Ž!œ0 , for i. g1 . . . 4 :4

1 ifm_i/0

mi™mis

½

₀ _if_m_s₀

Ž .

6

i

For the binary measures the following dependen-cies exist:

Ž .1 m4 is never 0 for finite A and B. It con-tributes no qualitative information. In consequence, a situation between any two regions A and B can be described qualitatively by combinations of the triple

4 3

m , m , m . That yields 21 2 3 s8 theoretically possi-ble combinations.

Ž .2 There is no pair of m , mŽ ₁ ₂.and m , mŽ ₁ ₃.that

Žœ œ.

(5)

< < < <

A nor B is empty. With Asm1qm2 and Bsm1

qm3 at least one term in each sum must be )0

Žwith the property of partitions to be pairwise dis-joint, the size of a union of intersection sets can be

.

written as a sum . That dependency excludes three of

4 4 4

the eight triples of m : 0, 0, 0 , 0, 1, 0 , 0, 0, 1 are_i impossible.

The remaining five triples correspond to the fol-lowing separable topological relations:

4 Ž .

1. 0, 1, 1 disjunctrtouching : A and B have no part in common;

4 Ž .

2. 1, 1, 1 overlap : A and B have parts in com-mon and parts not in comcom-mon;

4 Ž .

3. 1, 0, 0 equal : all parts of A are parts of B and vice versa;

4 Ž .

4. 1, 1, 0 containsrcovers : all parts of B are part of A, and A has additional parts;

4 Ž .

5. 1, 0, 1 contained byrcovered by : all parts of A are part of B, and B has additional parts.

Using intersection sets seems to be similar to the

Ž .

work of Egenhofer and Franzosa 1991 who deter-mined the topological relation between A and B by intersection sets first. They investigated the intersec-tion sets of interiors and boundaries with the restric-tion to simple regions. They could separate eight

Žfamilies of topological relations. Our classification.

works also for complex regions, which are multiply connected regions or regions with many components. Indeed, both approaches require regular closed re-gions. The proposed topological relations represent a subset of the Egenhofer relations; Fig. 3 is a general-ization of his conceptual neighborhood graph

ŽEgenhofer and Al-Taha, 1992 ..

The topological relation can serve as similarity

Ž .

[image:5.595.315.472.49.155.2]

measure on an ordinal scale Egenhofer, 1997 : equal is the highest degree of similarity, and each of the

Fig. 3. The topological relations representable by the two-dimen-sional intersection sets, related by conceptual neighborhood.

direct neighbored relations in the graph of Fig. 3 guarantees higher similarity than the relation dis-junct that is farthest from equal, with a distance of two graph edges. The next step has to be the quanti-tative description of similarity.

3.2. Combinations of intersection sets

Ž .

The size measures m_i of Eq. 5 will be investi-gated numerically for all possible combinations of intersection sets.

Ž .

With partition into at most four intersection sets, in principle 16 combinations of intersection sets are possible. The number follows from the sequence of

n

binomial coefficients

ž /

, with ns4, the number k

4

of intersection sets, and kg 0, . . . , 4 , the number of combined elementary sets. Concerning the size of these combinations, all those combinations contain-ing m4 as a term of the sum will be constantly `. With excluding m4 and the limitation to the triple

m1, m2, m34, the number of relevant unions of intersection sets decreases to the sequence of

4

mial coefficients with ns3 and kg 0, . . . ,3 . Their sizes are:

œ

ks0: 0

Ž

excluded for similar reasons thanm4

.

ks1: m1,m2,m3

Ž

the elementary set

.

7

Ž .

ks2: m1qm2,m1qm3,m2qm3

Ž

all 2-tuples

.

(6)

The domain of values for the size of an arbitrary

2 _Ž_< _<_. _w _x

set X inR _{is dom} _X _s _0,_`_{. The case k}_s_{0 is} trivial:

<

œ<

w x

dom 0

Ž .

s 0 .

Ž .

8

Regions A and B shall be limited to finite sets

< < < <

which may not be empty. Then holds: 0- A , B

-`. It follows for the sizes of three considered

inter-Ž .

section sets ks1 :

< < < < < <

domm1:0- AlBFmin

Ž

A , B

.

Ž .

9

< < < <

domm₂:0- !AlBF B

< < < <

domm₃:0- Al!BF A

Ž Ž ..

For ks2, it follows cf. Eq. 4 :

< < < < < <

m1qm2s AlBq Al!Bs A

Ž .

10

< < < < < <

m₁qm₃s AlBq!AlBs B

< < < <

m2qm3s Al!Bq!AlB with the domains:

< <

w

x

dom

Ž

m1qm2

.

s A

< <

w

x

dom

Ž

m1qm3

.

s B

Ž .

11

< < < <

w

x

dom

Ž

m2qm3

.

s 0, Aq B

Ž Ž ..

Finally, for ks3, it follows cf. Eq. 4 :

< < < < < <

m₁qm₂qm₃s AlBq Al!Bq!AlB

< <

s AjB

Ž .

12

with the domain:

< < < < < < < <

dom

Ž

m1qm2qm3

.

s max

Ž

A , B ,

.

Aq B . 13

Ž .

In the following, the union sets will be used as short forms for the combination of elementary inter-section sets: m8sm1qm2qm3.

3.3. Other size measures

Obviously other set size measures exist. Some of them already occur in the domain limitations of the

Ž Ž . Ž . Ž ..

location-based measures Eqs. 9 , 11 and 13 . In the linear form, they are:

< < < < < < < <

m5smin

Ž

A , B ,

.

m6smax

Ž

A , B ,

.

< < < <

m7s Aq B .

Ž .

14

These three measures are dependent by:

< < < < < < < < < < < <

min

Ž

A , B

.

qmax

Ž

A , B

.

s Aq B .

Ž .

15

All three of these measures are independent from the relative location of two regions, and for that reason, they are not considered as candidates for location-based similarity measures. But they are needed for normalization of the location-based mea-sures as the consideration of domains has shown. Besides, these measures are symmetric.

There can also be set up nonlinear measures.

< < < <

Partly, they are of higher dimensions — e.g., A PB — which disqualifies for normalization. The other part consists of norms of higher dimension measures. A prototype is the L -norm, e.g., p_p s2 produces

'

<A< < <PB ŽMolenaar and Cheng, 1998 ..

Such norms use necessarily the same combina-tions of sets as the already given measures; they contribute no new information. Therefore, the given set of size measures is sufficiently complete.

4. Location-based similarity measures

In this section, we will derive location-based sim-ilarity measures with special attention on complete-ness. The size measures of Section 3 are used and coupled with three criteria for similarity measures: symmetry, normalization and freedom of dimension. It will be possible to set up lists of such measures and to describe their properties.

4.1. Criteria for similarity measures

Three criteria will be established to specify simi-larity measures. With these criteria, it will be possi-ble to derive such measures from the size measures. As already discussed similarity is an empirical con-cept which is not identical to distance. Nevertheless, special conditions for distances will be investigated too.

The considered criteria for similarity measures are:

(7)

prototype. In such neutral situations, a measure must be independent from the order of the considered regions A and B:

similar A, B

Ž

.

ssimilar B, A .

Ž

.

Ž .

16

Ž .2 Domain limitation: It is useful to have normal-ized measures. This property eases interpretation and comparison of measures:

0Fsimilar A, B

Ž

.

F1.

Ž .

17

For this reason, suited ratios of size measures are introduced as similarity measures.

Ž .3 Freedom of dimension: Similarity measures shall be free of dimension because similarity is no physical concept or property. That can be reached by building ratios of measures with the same dimension.

4.2. Symmetry

First, we consider symmetry in the size measures. The case ks0 is meaningless in the context of similarity. From all other tuples only a few are

Ž

symmetric taking advantage from abbreviations by

.

unions :

œ Ž .

ks0:0 excluded above

Ž .

ks1:m1 based on AlB

18

Ž .

Ž Ž . Ž ..

ks2:m2qm3 based on Al!B j!AlB

Ž .

ks3:m8 based on AjB

In the following, it is sufficient to investigate this reduced set of size measures as the only symmetric ones. They have to be normalized now.

4.3. Normalization to dimension-less ratios

Ž .

The symmetric size measures of Eq. 18 will be normalized. For that purpose, the domains of size

Ž Ž . Ž . Ž ..

measures are used Eqs. 9 , 11 and 13 . Normal-ization must not destroy the symmetry property; for that reason, the norm factors must be symmetric too. Further, norm factors may never take the value 0.

< <

This argument excludes the measures m₁s AlB

Ž . <Ž . Ž .<

and m2qm3 s !AlB j Al!B from the

list of possible norm factors. To keep the third criterion, only the linear size measures are consid-ered as norm factors.

The remaining candidates for norm factors are the measures:

m8,m5,m6,m7

Ž .

19

Table shows the matrix of all 4=3 ratios. Not all

4

of the ratios are normalized to 0 . . . 1 . The ratios will be discussed individually in the next section.

4.4. Similarity measures

In this section, the ratios of Table 1 will be investigated. Normal ratios are considered as similar-ity measure, or as dissimilarsimilar-ity measure, if their behaviour meets, the following idea of location-based similarity. The meaning of a location-based similar-ity measure shall be that of a fuzzy membership

Ž .

value Zadeh, 1965 to a topological relation; a value of 1 refers to total correspondence with the discrete relation and a value of 0 refers to total disagreement with the discrete relation. Such a relation can be equal. There will be also similarity measures regis-tering correspondence to any containment. The meaning of the individual measures will be discussed with regard to the actual topological relation referred to.

The ratios in detail are as follows.

w x

s : Domain of values is 0, 1 . 0 stands for totally₁₁

Ž œ.

disjoint regions AlBs0 , and 1 stands for

identi-Ž .

cal regions AlBsAjB . This ratio is a proto-typical example of a location-based similarity mea-sure increasing with the grade of similarity up to equality.

w x

s : Domain of values is 0, 1 . 0 occurs only if12

[image:7.595.287.503.521.608.2]

AsB, and 1 occurs if A and B are totally disjoint. With this behaviour, the ratio complements s ,11

Table 1

Combination of all possible ratios of size measures Denominator Numerator

< < < < m1sAlB m2qm3 m8sAjB

< < s!AlB

< < qAl!B

< <

m8sAjB s11 s12 s13

Ž< < < <.

m5smin A , B s21 s22 s23

Ž< < < <.

m6smax A , B s31 s32 s33

< < < <

m7sAqB s41 s42 s43

(8)

which is corresponding to the complementing sets in the numerators with regard to the denominator. This ratio is a typical dissimilarity measure decreasing with the grade of similarity.

w x

s : Domain of values is 1 , trivially.₁₃

w x

s : Domain of values is 0, 1 . 0 stands for totally₂₁ disjoint regions, and 1 stands for complete coverage,

Ž .

containment, or identity :. The ratio does not recognize the proportion in size between A and B and, therefore, it is not suited as a similarity mea-sure. Nevertheless, this ratio could be used as a

Ž .

measure for the grade of symmetric overlap.

w .

s : Domain of values is 0,22 `. Again, 0 occurs only if AsB. But the denominator is not sufficient to normalize the numerator. That property excludes this ratio from the list of similarity measures. Addi-tionally, values different from 0 are difficult to inter-pret, because numerator and denominator are not correlated.

w .

s : Domain of values is23 1, `. 1 occurs if

AsB, and the ratio increases in all other cases. Without being normalized, this ratio is excluded from the list of similarity measures.

w x

s : Domain of values is 0, 1 . 0 occurs if both₃₁ regions are disjoint, and 1 occurs only if AsB in contrast to s . With its sensitivity for proportions21

between A and B, this ratio is a suited similarity measure.

w x

s : Domain of values is 0, 2 . 0 occurs if A32 sB,

< < < <

and 2 occurs if A is disjoint from B and As B . As long as one region is coveredrcontained in the other region, the value of the ratio is limited by an upper bound of 1. As long as both regions are disjoint, the value of the ratio is limited by a lower bound of 1. In any case of overlap, no prediction can be made. This ratio could be normalized by division

Ž

by 2; then it represents a dissimilarity measure

de-.

creasing with growing similarity .

w x

s : Domain of values is 1, 2 . The value 1 stands₃₃ for all cases of coveragercontainment or identity.

< < < <

The value 2 occurs for disjoint regions, if As B . Neither domain nor the behaviour recommends this ratio as similarity measure.

w x

s : Domain of values is 0, 141 r2 . 0 stands for disjoint regions, and 1r2 stands for AsB. If we

Ž .

would normalize the ratio by multiplication with 2 , the result would be a mean size of A and B as

Ž Ž ..

denominator cf. Eq. 15 . Then the behaviour of

Ž .

the normalized ratio s41 would be in between of s31 and s . This yields no new information.21

w x

s : Domain of values is 0, 1 . 0 occurs if A42 sB, and 1 occurs if A and B are disjoint. Again, this is a mean ratio of s22 and s , but it fulfills the condi-32

Ž .

tions of a dissimilarity measure.

w x

s : Domain of values is 143 r2, 1 . The lower bound occurs if AsB. 1 occurs in all cases of disjoint regions, but is reached also in all other

< < < <

topologic relations, if A and B are different in the order of magnitude. This ratio represents an extraor-dinary dissimilarity measure.

In summary, given all possible ratios of size measures the following are similarity measures:

4

similarity measures: s , s , s₁₁ ₃₁ ₄₁)_{2 .}

Ž .

₂₀

Another list contains dissimilarity measures:

dissimilarity measures:

s32 1

s ,12 , s , s42 43y )2 .

Ž .

21

½

2

ž

2

/

5

Both lists are complete regarding the given crite-ria.

4.5. Combination of similarity measures

In this section, different combinations of similar-ity measures will be discussed. Evidence will be given that both lists from above are needed, which will be supported by some examples of recent appli-cations.

Ž Ž ..

With Tversky’s contrast model in mind Eq. 1 , our lists of similarity and dissimilarity become more transparent. All similarity measures are based on the

< <

numerator AlB , which represents the common features between A and B. All dissimilarity mea-sures, with one exception, are based on the

numera-< < < <

(9)

Ž

Consider the following example Harvey et al.,

.

1998 : To evaluate a match of two regions, two measures are introduced: an inclusion function, which is in fact identical to s21 and yields the grade of

Ž

overlap instead of similarity nevertheless: the

com-.

mon features , and a surface distance, which is

iden-Ž

tical to s₁₂ and which measures dissimilarity

dis-.

tinctive features . Thus, the hypothesis is supported that two measures are needed. The question remains interesting whether other pairs of measures would have been also useful. The authors do not discuss their choice. Another example is mentioned in Ragia

Ž .

and Winter 1998 : The authors match two buildings from two data sets with special requirements regard-ing the aggregation levels of the data sets. Part of relations are accepted as a match. Similarity is re-placed by weighted topological relations, e.g., by s21

and s . With this choice, only common features are31

considered, but not the distinctive.

Similarity of regions has to be handled in a different way to similarity of lower dimensional

Ž .

entities. Recently, Walter 1997 matched lines and points of street networks. He works only with

dis-Ž .

tance measures costs neglecting the weight of com-mon features. That is justified for one-dimensional data sets because the probability is very small that

Ž

two lines coincide by chance the probability for two

.

points is even zero .

Similarity of spatial relations cannot be treated by

Ž

sizes of sets the single exception are topological

. Ž .

relations . For example, Bruns and Egenhofer 1996

Ž .

and Egenhofer 1997 are investigating spatial scenes. Though they involve metric refinements of

topologi-Ž Ž ..

cal relations cf. Eq. 5 , they need an additional concept of similarity for other spatial relations. They also work with distance measures, which they derive from conceptual neighborhood graphs.

Metric properties would require additional condi-tions, especially reflexivity and the triangle equation.

Now it will be investigated how far the location-based similarity measures follow such rules.

A symmetric, normalized similarity measure al-lows to introduce its inverse:

dissimilar A, B

Ž

.

s1ysimilar A, B

Ž

.

Ž .

22

The inverse topological relation is always dis-junct, which will be supported by the interpretation of the similarity and dissimilarity measures in Sec-tion 4.4. The found measures will not complement each other; therefore, the formal introduction of an inverse is useful.

Reflexivity:

similar A,

Ž

!A

.

s0, similar A, A

Ž

.

s1.

Ž .

23

If B is assigned to !A, the first rule is fulfilled by all three similarity measures, with m1s0 for disjunct regions. If B is assigned to A, the second rule is fulfilled by all three similarity measures.

Reflexivity put on dissimilarity requires an ex-change of the rules applying the inverse property

ŽEq. 22Ž ..on Eq. 23 :Ž .

dissimilar A,

Ž

!A

.

s1, dissimilar A, A

Ž

.

s0 24

Ž .

A triangle equation, e.g., in the form:

similar A, B

Ž

.

)similar B,C

Ž

.

Fsimilar A,C

Ž

.

does not hold. Multiplication is required to keep the norm, and the relation sign has to be converted for multiplication factors -1. But neither disjunct re-gions A and C require that A and B or B and C are

Ž .

disjunct i.e., their similarity is 0 , nor equal regions A and C require that A and B or B and C are equal too. Location-based similarity is not metric.

[image:9.595.118.437.551.607.2]

Ž . Ž . Ž .

(10)

[image:10.595.127.422.59.619.2]

Ž .

(11)

5. Testing the behaviour of the found measures

Ž Ž . Ž ..

Similarity measures Eqs. 20 and 21 can be implemented directly. A small test program allows to investigate the behaviour of similarity measures

sys-1 _Ž _.

tematically. Three tests were performed Fig. 4 , each of them concentrating on a specific aspect of

Ž .

location similarity. Scenario a starts from two poly-gons of the same size and shape, and one polygon

Ž .

moves over the location of the other. Scenario b is basically the same scenario, but the size and shape of the two polygons are significantly different. Scenario

Ž .c starts from two polygons, where one polygon is inside of the other, and the smaller polygon grows stepwise.

The results are shown in Fig. 5, where each line s11 to s43 shows the behaviour of a similarity

mea-Ž

sure s11 to s43 during the transformation translation

.

or resizing . It can be observed that in each scenario

Ž

the two sets of measures are correlated some

mea-.

sures are even identical . Further, it can be observed that in certain scenarios some measures act more selectively than others. However, basically, the mea-sures behave in a similar way. It seems not to be

Ž .

relevant which measure pair of measures is chosen in a specific situation.

6. Summary, discussion and conclusion

This article presents a systematic investigation of location-based similarity measures between discrete regions of different data sets. Considering only mea-sures that are symmetric, normal and free of dimen-sion, it is shown that only seven of such measures exist. The set of similarity measures can be classified into measures counting common features of regions and measures counting distinct features. A complete description of similarity requires one measure from both classes.

By measuring the sizes of intersection sets some similarity measures are related strongly to graded topological relations. s₁₁ represents a grade of equals,

1

This test program is implemented in Haskell; the code is available from ftp:rrgi27.geoinfo.tuwien.ac.atrwinterr

winter00location.hs.

s21 represents a grade of overlaps, s , as the com-12

plement of s , represents a grade of disjoint. Grada-11

tions of containment cannot be found; but a concept of a graded containment may probably coincide with the grade of overlap. Boundary-based topological relations are not treated at this point.

Ž Ž ..

Tversky’s contrast model Eq. 1 has the advan-tage of having only one measure for overall similar-ity. On the other hand, there may be proposed as many measures s as different weights a,b,g exist, without any idea for such weights. Then the choice depends on the context of a comparison, which cannot be treated systematically. It is omitted to discuss combinations of weights. However, a few statements about the weights are possible. A sym-metric measure requires bsg, which corresponds to the dissimilarity measures that do not distinguish AyB and ByA. The special case of a cost model is included by setting as0, and also a benefit model can be represented by bs0 and gs0.

It may be criticized that our concept of location

Ž 2_. _Ž 2_.

based on sets of points R _{or atoms} Z _{is too} specific in parametrization. Indeed, other frames of

Žlocational reference are possible Bittner and Stell,. Ž .

1998 . Moreover, with the Hausdorff distance, a distance measure exists which is even more general

Ž .

in parametrization of space Edgar, 1990 . The

Haus-Ž

dorff distance is symmetric and one-dimensional the

.

set sizes above are two-dimensional in planar space . It is zero if and only if AsB. Any other value

Ž-0 does not allow to conclude a topological con-.

figuration. This disadvantage cannot be adjusted be-cause an adequate measure of common features is not yet known. For that reason, the Hausdorff dis-tance cannot be completed to a similarity measure.

Ž Ž ..

With the binary location function Eq. 2 , only discrete regions are tested for similarity. That fits to data sets in today’s spatial databases, where a need for quality description is realized but usually not available. On the other hand, the presented model for similarity measures could be refined for uncertain or

Ž .

(12)

The presented similarity measures increase lin-early with common location as a consequence of setting elementary set sizes into ratios. Such a model is purely mathematical, and there is no reason to assume that human cognitive concepts are compara-ble, with the exception of simplicity.

Similarity is a general concept applied to many spatial decision problems. The systematic investiga-tion succeeds by limiting itself to a strict frame of reference. Concentrating on location of two spatial

Ž .

objects regions , an elementary set of similarity measures can be presented. To what extent the model can be expanded leaves to be investigated.

Acknowledgements

The idea of this paper goes back to a discussion with Andrew Frank. Besides, I had interesting dis-courses about philosophical aspects of similarity and location with Katrin Dyballa and Thomas Bittner, both from Vienna.

References

Ackermann, F., 1984. High precision digital image correlation. 39th Photogrammetric Week, Institute for Photogrammetry, Stuttgart University. Schriftenr. Inst. Photogrammetrie vol. 9, pp. 231–244.

Bittner, T., Stell, J., 1998. A boundary-sensitive approach to qualitative location. Ann. Math. Artif. Intell. 24, 93–114. Bruns, H.T., Egenhofer, M.J., 1996. Similarity of spatial scenes.

Ž .

In: Kraak, M.-J., Molenaar, M. Eds. , Advances in GIS Research. Taylor & Francis, Delft, pp. 173–184.

Edgar, G.A., 1990. Measure, Topology, and Fractal Geometry. 2nd edn. Undergraduate Texts in Mathematics, Springer, New York.

Egenhofer, M.J., 1997. Query processing in

spatial-query-by-Ž .

sketch. J. Visual Lang. Comput. 8 4 , 403–424.

Egenhofer, M.J., Herring, J.R., 1990. A mathematical framework for the definition of topological relationships. 4th International Symposium on Spatial Data Handling. International Geograph-ical Union, Zurich, pp. 803–813.

Egenhofer, M.J., Franzosa, R.D., 1991. Point-set topological

spa-Ž .

tial relations. Int. J. Geogr. Inf. Syst. 5 2 , 161–174. Egenhofer, M.J., Al-Taha, K.K., 1992. Reasoning about gradual

changes of topological relationships. Theories and Models of Spatio-Temporal Reasoning in Geographic Space. In: Frank,

Ž .

A.U., Campari, I., Formentini, U. Eds. , Lect. Notes Comput. Sci. vol. 639, Springer, Berlin, pp. 196–219.

Flasch, K., 1986. Das philosophische Denken im Mittelalter. Philipp Reclamjun, Stuttgart.

Haralick, R.M., Shapiro, L.G., 1992. Computer and Robot Vision vol. I, Addison-Wesley, Reading, MA.

Harvey, F., Vauglin, F., Ali, A.B.H., 1998. Geometric matching of areas — comparison measures and association links. In:

Ž .

Poiker, T.K., Chrisman, N. Eds. , 8th International Sympo-sium on Spatial Data Handling. International Geographical Union, Vancouver.

Jahne, B., 1995. Digital Image Processing. 3rd edn. Springer,¨

Berlin.

Lakoff, G., 1987. Women, Fire, and Dangerous Things — What Categories Reveal About the Mind. Univ. Chicago Press, Chicago.

Metzger, W., 1936. Gesetze des Sehens. Senckenberg-Buch vol. VI, W. Kramer, Frankfurt am Main.

Molenaar, M., Cheng, T., 1998. Fuzzy spatial objects and their dynamics. ISPRS Commission IV SymposiumAGIS Between Visions and ApplicationsB. In: Fritsch, D., Englich, M., Sester,

Ž .

M. Eds. , Int. Arch. Photogramm. Remote Sens. vol. 32r4, pp. 389–394, Stuttgart.

Ragia, L., Winter, S., 1998. Contributions to a quality description of areal objects in spatial data sets. ISPRS Commission IV Symposium AGIS Between Visions and ApplicationsB. In:

Ž .

Fritsch, D., Englich, M., Sester, M. Eds. , Int. Arch. Pho-togramm. Remote Sens. vol. 32r4, pp. 479–486, Stuttgart. Stevens, S., 1946. On the theory of scales of measurement.

Ž .

Science 103 2684 , 677–680.

Ž .

Tversky, A., 1977. Features of similarity. Psychol. Rev. 84 4 , 327–352.

Vosselman, G., 1992. Relational Matching. Lect. Notes Comput. Sci. vol. 628, Springer, Berlin.

Walter, V., 1997. Zuordnung von raumbezogenen Daten am Beispiel der Datenmodelle ATKIS und GDF. PhD thesis, Fakultat f ur Bauingenieur-und Vermessungswesen der Univer-¨ ¨

sitat Stuttgart.¨

Winter, S., 1996. Unsichere topologische Beziehungen zwischen ungenauen Flachen. PhD thesis, Landwirtschaftliche Fakultat¨ ¨

der Rheinischen Friedrich-Wilhelms-Universitat Bonn.¨