Representation of data types in empirical axiomatic theories The first step of the analysis of empirical content of data consists of the

Relational Data Mining (RDM)

Case 1. Physical data types in physical problems. Multidimensional data contain only physical quantities and the learning task itself belongs to

4.11. Empirical axiomatic theories: empirical contents of data

4.11.2. Representation of data types in empirical axiomatic theories The first step of the analysis of empirical content of data consists of the

representation of data in empirical axiomatic theories. Below are considered several known data types such as comparisons, binary matrices, matrices of orderings, matrices of proximity and attribute-based matrix. Moreover, empirical axiomatic theories are the most general representation of various data types. They represent well known data types, mixture of various data types,

and data types that have no numerical representation at all.

Next, we review existing methods for processing these data types. This includes determination of assumptions of existing methods. Finally, we show that relational methods such as MMDR do not require restrictive assumptions about data types. Any data type can be accommodated and even discovered in training data. It is known that humans answer more precisely

for qualitative and comparative questions than for quantitative questions.

Therefore, the representation of these data types in axiomatic empirical theories is convenient.

Attribute-based matrix (table). Data D can be represented as an attrib-

ute-based matrix where is the numerical value

of j attribute on i-th object. Attributes may be qualitative and quantitative.

The fact that numerical values of attributes exist means that there are n measurement procedures, which produce them. Let us denote these procedures as where a is an empirical entity. Then,

Attribute-value methods such as neural networks deal with this type of data. These methods are limited by the assumption that data types are stronger than interval and log-interval data types, which is not always true.

For definitions of these data types, see Section 4.9.2.

Let us determine art empirical axiomatic system for attribute-based matrices. At first, a set of empirical predicates for each attribute needs to be defined. There are two cases:

1. The measurement procedure is well known and an empirical system is known from measurement theory. Therefore, the set of all empirical predicates contains predicates given in

2. An empirical system of the measurement procedure is not com- pletely defined. In this case, we have a measurement procedure, but we do not have an empirical system. The measurement procedure in the second case is called a measurer. The examples of measurers are psychological tests, stock market indicators, questionnaires, and physical measurers used in non-physical areas.

Let us define a set of empirical predicates for measurer procedure For any numerical relation in (Re - the set of all real numbers), we can define the following empirical relation on

The measurer obviously has an empirical interpretation, but relation may not. We need to find such relations R that have empirical interpreta- tions, i.e., relation is interpretable in the terms of domain theory.

Suppose that {R1,…,Rk } is a set of the most common numerical relations and some (relations ) have an empirical interpretation. This set of relations is not empty, because at least the relation (equivalence) has an empirical interpretation:

In measurement theory, there are many sets of axioms based on just ordering and equivalence relations. Nevertheless, these sets of axioms estab- lish strong data types. A strong data type is a result of interaction of the quantities with individual weak data types such as ordering and equivalence. For instance, having one weak order relation (for attribute y) and n equivalence relations

for attributes we can construct a complex relation between y and given by

where is a polynomial [Krantz et al, 1971].

This is a very strong result. To construct a polynomial we need the sum operation, but this operation is not defined for However, relation G is equivalent to polynomial f if a certain set of axioms expressed in terms of order relation presented above for y, and equality relations (=) are true

for

This fundamental result serves as a critical justification for using ordering relations as a base for generating relational hypotheses in financial applications (Chapter 5). Ordering relations usually are empirically interpretable in finance.

Multivariate and pair comparisons [Torgerson, 1952, 1958, Shmerling D.S. 1978]. Consider set of objects and set of all tuples of k objects from A. A group of n experts are asked to order objects in all tuples from in accordance with some preference relation.

the entity The set of all ordered tuples is denoted by

The typical goal of pair and multivariate comparison methods is to order all tuples. Known methods are based on some a priory assumptions [Torgerson, 1952, 1958; Shmerling D.S. 1978] which determine the areas of applicabil-

ity.

Let us define for every expert s the preference relation

Also, define the two equivalence relations ~ , and equivalence relation =

by

and

are the same.

Therefore, we obtain a set of empirical predicates

and, thus, a data type represented with the set of empirical predicates V.

Matrix representation of binary relations. A binary relation P(a,b) may be defined on the set of objects by the matrix i,j =

l,…,m where means that relation is true (false). Using such matrices, any binary relation on the set A can be defined. This representation of binary relations is widely used [Terehina, 1973; Tyrin et al 1977,1979, 1981; Mirkin 1976,1980; Drobishev 1980; Kupershtoh et al, 1976]. The most common binary relations are equivalence, order, quasi- order, partial order (see Section 4.9.1), and lexicographical orders.

Matrices of orderings. Let i = 1,…,m; j = 1,…,n, where - rank of i-th object on j-th attribute, be a matrix of orderings. Any matrix can be ei- ther the matrix of orderings of m objects by n experts or the matrix of n dif- ferent orderings on m objects. Methods of multidimensional scaling or Let be an object i from tuple where t is an entity to be evaluated, s is an expert and q is a preference rank given by an expert s to

ranking [Tyrin et al, 1977,1979, 1981; Kamenskii, 1977] consider these matrices.

Let us define relation for an attribute j:

where are objects with numbers i1, i2 from

In this way we obtain the set of empirical predicates In these terms, protocol of observations of the predicates from V on the set of objects A is the relational structure

Matrices of closeness. A matrix of closeness for set of objects

is the matrix where is a numerical estimate of closeness (similarity or difference) of objects i and j from A by an expert using an ordering scale.

These matrices are processed by multidimensional scaling methods [Holman, 1978, Satarov, Kamenskii, 1977, Tyrin et al, 1979, 1981]. They represent objects from set A as points in some metric space (Euclidean or Riemannian). This space is a space of minimal dimension which preserves original distances between objects in the original space as much as pos- sible. These methods have general limitations. There is no criterion for matching a data matrix and a multidimensional scaling method, and some matrices of closeness can not be embedded in a metric space. Thus, we can consider these embedded data as an attribute-based matrix. Let us represent the matrix of closeness in relational terms of empirical axiomatic theories.

Let This relation is defined on entire set A.

Hence, the protocol in the set of predicates will be the relational system

In measurement theory such empirical systems are denoted by

where and is a binary ordering relation defined

on A

. Let us describe some results from measurement theory related to such

empirical systems.

Scale of positive differences [Krantz et al 1971,p.147]. There exists a mapping

such that for every pair (a,b), (b,c), (c,d) from A^*, a) and

This scale means that for an empirical ordering relation for pairs of empirical objects, we can find a numerical function with an interpretable sum operation (+) on its values. The difference between a and c can be obtained as the sum of differences between (a, b) and (b,c). See property b).

The sum is interpretable because differences (a,c), (a,b) and (b,c) are interpretable. It is important to mention that original differences (a,c), (b,c) and (a,c) have no numeric values. We only know from the ordering relation that, for instance, if there are two pairs of stocks and an expert expresses his/her opinion that

i.e., from his/her viewpoint behavior of the Microsoft stock is closer to behavior of the Intel stock than to behavior of the Sun stock. Figure 4.2 shows actual data for these three companies for July 99 normalized by the max stock prices for that month.

Figure 4.2. Normalized stock data

There is no numeric measure of closeness between stocks yet. It is convenient to introduce a numeric measure of closeness between stocks. We may wish to introduce such that, for instance,

If we are able to find such measure of closeness consistent with the ordering relation given by an expert, then we can use the sum operation (+) in numeric data mining methods for discovering regularities.

This sum will be interpretable in contrast with other functions which can be invented. An interpretable mapping is a homomorphism. For the formal definition of homomorphism, see Section 4.9.3. We can also view homomorphism as a way of transforming a numeric presentation to a relational presentation for using relational data mining methods in discovering regularities.

For a scale of positive differences, a mapping is unique within a positive numerical multiplier q, i.e., any other interpretable mapping can be

Let There exists a homomorphism such that for every

Let A is finite set, There exists a homomorphisms (N – natural numbers), such that for every

This representation is unique within the linear transformations (interval scale). Similarly, other data types can be converted into the relational form of axiomatic empirical systems.

Dalam dokumen DATA MINING IN FINANCE (Halaman 195-200)