DATA MINING IN FINANCE

The very complex and challenging problem of forecasting financial time series requires specific methods of data mining. Numerical relational data mining methods are especially important for financial analysis where data commonly are numerical financial time series.

The scope and methods of the study

Introduction
Problem definition
Data mining methodologies

Parameters
Problem ID and profile
Comparison of intelligent decision support methods

Modern methodologies in financial knowledge dis- covery

Deterministic dynamic system approach
Efficient market theory
Fundamental and technical analyses

Data mining and database management
Data mining: definitions and practice
Learning paradigms for data mining
Specific cases or experiences applied to new situations by matching known cases and experiences with new cases (instance-based learning,
Rules in first-order logic form (Horn clauses as in the Prolog language) (analytic learning paradigm)
A mixture of the previous representations (hybrid paradigm)

Intellectual challenges in data mining

Relational Data Mining combines recent advances in such areas as Inductive Logic Programming (ILP), Probabilistic Inference, and Representative Measurement Theory (RMT). Many other proprietary financial applications of data mining exist, but are not reported publicly [Von Altrock, 1997; Groth, 1998].

Numerical Data Mining Models and Financial Applications

ARIMA Models

An autoregressive process is defined as a linear function matching p preceding values of a time series with V(t), where V(t) is the value of the time series at the moment t. The major difference between AR(p) and MA(q) models is in their components: AR(p) is averaging the p most recent values of the time series while MA(q) is averaging the q most recent random disturbances of the same time series.

Steps in developing ARIMA model

Then coefficients of the ARIMA model are estimated along with the error of the model (residual). The process can be re- peated for different p, d, and q, if the identification of the model is uncertain.

Seasonal ARIMA

These estimates are accompanied by statistical parameters like the confidence limits, the standard error of the coefficients, and the statistical significance of the coefficients. We have already mentioned that identification of p, d, and q is the most subjective step, therefore, several alternative values of p, d, and q can be used.

Exponential smoothing and trading day regression

Comparison with other methods

Independence of an expert is relatively high in comparison with neuro- fuzzy and some other methods. Other features of ARIMA and other statistical methods are on the aver- age level in comparison with alternative methods (table 1.7 in chapter 1).

Financial applications of autoregression models

On the other hand, even a very subtle forecasting result combined with an appropriate trading strategy can bring a significant profit. However, correct forecast of the sign is sufficient to form a successful buy/sell trading strategy.

Instance–based learning and financial applications

In Chapter 1 (Table 1.7) we presented comparative capabilities of different data mining methods based on [Dhar, Stein, 1997]. Excepting the feature - “ease of use of numerical data”, the probabilistic ILP methods have advantages over IBL.

Neural Networks

Introduction
Steps
Recurrent networks
Dynamically modifying network structure

An activation level for input nodes, is taken from external (envi- ronment) nodes, which do not belong to the neural network. Similarly, the out- put nodes, deliver values into some external nodes, which also do not belong to the neural network.

Neural networks and hybrid systems in finance

For example, in a buy/hold/sell trading strategy, we are much more interested in the correct forecast of stock direction (up/down) rather than the error itself. General properties of neural networks in comparison with requirements of stock price forecast are shown in tables 1.5 and 1.6 in chapter 1.

Recurrent neural networks in finance

This is a set of meaningful symbolic rules extracted from a time series with a significant noise level. The recurrent neural network combined with extracting deterministic fi- nite state automata and discrete Markov process form a hybrid approach in data mining.

Modular networks and genetic algorithms

Mixture of neural networks
Genetic algorithms for modular neural networks

The next set of examples (“next generation of genotypes”) is generated using these values and transitional operators. In [Oliker, 1997] each new generation is represented by its set of weights and connectivities within the neural network.

Testing results and complete round robin method

Introduction
Approach and method
Multithreaded implementation
Experiments with SP500 and neural networks

Here is the actual target value for and is the target value forecast delivered by discovered model J, i.e., the trained neural network in our case. Each subprocess can be matched to learning an individual neural network or a group of the neural networks.

Expert mining

The idea of the approach is to represent the questioning procedure (interviewing) as a restoration of a monotone Boolean function interactively with an “oracle” (expert). In the experiment below, even for a small number of attributes (5), using the method based on monotone Boolean functions, we were able to restrict the number of questions to 60% of the number questions needed for complete search. In particular, in one of the tasks by using monotonicity and the hierarchy, the maximum number of questions needed to restore the monotone Boolean functions was reduced first to 72 questions and then further reduced to 46 questions using the Hansel lemma.

A minimal dynamic sequence of questions means that we reach the minimum of the Shannon Function, i.e., the minimum number of questions required to restore the most complex monotone Boolean function of n arguments. Columns 2, 3 and 4 in table 2.13 present values of the three functions and of five arguments chosen for this example.

Iterate until finished)

Interactive Learning of Monotone Boolean Functions

Basic definitions and results
Algorithm for restoring a monotone Boolean function
Construction of Hansel Chains

Hansel chains are derived independently of the particular applied problem, they depend only on the number of attributes (five in this case). A binary vector x of length n is said to be an upper zero of a function and, for any vector y such that we have Also, the term level represents the number of units (i.e., the number of the elements) in the vector x and is denoted by U(x). An upper zero x of a function f is said to be the maximal upper zero if for any upper zero y of the function f [Kovalerchuk, Lavkov, 1984].

In terms of machine learning, the set of all maximal upper zeros represents the border elements of the negative patterns. Restoration algorithms for monotone Boolean functions which use Hansel's lemma are optimal in terms of the Shannon function.

Rule-Based and Hybrid Financial Data Mining

Decision tree and DNF learning

Advantages
Limitation: size of the tree
Constructing decision trees

This way of discovering creates an important advantage of decision tree learning -- discovered rules are consistent. There are many consistent sets of rules, which cannot be represented by a relatively small single decision tree [Parsaye, 1997]. Rule Rl and its equivalent decision tree in figure 3.1 belong to the relatively restricted language of propositional logic.

The decision tree in figure 3.2 delivers a complete solution; therefore, the close world assumption is satisfied for this tree. It is a special property of decision tree methods that all rules have a specific DNF form, i.e., all AND clauses (con- junctions) include a value of the attribute assigned to the root.

Predictive accuracy and 2. Comprehensibility

Ensembles and Hybrid methods for decision trees
Decision tree and DNF learning in finance

Decision-tree methods in finance
Extracting decision tree and sets of rules for SP500
Sets of decision trees and DNF learning in finance

Extracting decision trees from neural networks

Approach
Trepan algorithm

Extracting decision trees from neural networks in fi- nance

Predicting the Dollar-Mark exchange rate
Comparison of performance

Probabilistic rules and knowledge-based stochastic modeling

Probabilistic Networks and Probabilistic Rules
The naïve Bayes classifier
The mixture of experts

A fragment of the extracted decision tree (the first 9 of 25 levels) is presented in Figure 3.6. There are only five completely correct rules (5.8%) out of the 85 rules extracted from the decision tree. We have already discussed one of the generalization methods – the m-of-n form of a decision tree.

The fidelity parameter measures how close the forecast of the decision tree is to the forecast by the underlying neural network. Remember that these authors measure comprehensibility of the decision tree by the number of nodes.

Compute conditional probability where is an expert;

Combine experts according to their probability distribution

The hidden Markov model
Uncertainty of the structure of stochastic models
Knowledge-based stochastic modeling in finance

Markov chains in finance
Hidden Markov models in finance

This rule is a combination of the bold cells in table 3.17 where the IF-part is taken from the first column. Similarly, the bold cell on the first row is used as the THEN-part of the rule. The probability 0.6 can be found in the inter- section of the respective row and column.

The goal of predicting a probability distribution is significantly different from the typical goal in finance -- predicting the next value of the time series. The probability distribution delivers a wider picture of the possible future of the stock market.

Relational Data Mining (RDM)

Examples

In this section, several examples illustrate the difference between relational and attribute-value languages used in Data Mining. In an attribute-value language, objects are described by tuples of attribute-value pairs, where each attribute represents some characteristic of the object, e.g., share price, volume, etc. Neural networks and many other attribute-value learning systems have been used in financial forecasting for years (see chapter 3).

The first one is based on numerical expressions and the second one is based on logical expressions and operations. Neural networks and autoregression methods (chapter 2) exemplify the first type of methods based on numerical representation and DNF methods (chapter 3) exemplify the second type based on logical expressions.

IF stock price today is more than $60 and trade volume today is greater than 900,000 THEN tomorrow stock will go down.

Relational data mining paradigm
Challenges and obstacles in relational data mining One of the major obstacles to more effective use of the ILP methods is
Theory of RDM

Data types in relational data mining
Relational representation of examples
First-order logic and rules

Background knowledge

Arguments constraints and skipping useless hypotheses
Initial rules and improving search of hypotheses

The literal (or conjunction of literals) with the maximum gain is added to the end of the current clause (start clause can be null)

Relational data mining and relational databases
Algorithms: FOIL and FOCL
Algorithm MMDR

A list of components (attributes) requires the length of the list and the set of types of components. The meaning of the rule for a k-arity predicate is the set of k-tuples that satisfy the predicate. This old variable can be in either the head or the current body of the rule (Horn clause).

This requires computing the information gain for each variablization of each predicate P. The information gain metric used by FOIL is. and are the numbers of positive and negative tuples before adding the literal to the clause,. and are the numbers of positive and negative tuples after adding the literal to the clause, and. is the number of positive tuples before adding the literal that has at least one corresponding extension in the positive tuples after adding the literal [Quinlan, 1990]. The branching factor grows exponentially in the arity of the available predicates and the predicate to be learned.

Rule (4) is called a regularity if it satisfies the following conditions

It is well known that the exhaustive search of all rules (4) for finding the set of BST rules is practically impossible if predicates with more than one variable are used. Therefore, the search should be constrained to a lesser set of rules, but still allowing the discovery of BST rules.

If rule R is a BST rule for data D, then rule R is regularity on data D [Vityaev, 1992]

Fisher test
MMDR pseudocode
Comparison of FOIL and MMDR
Numerical relational data mining
Data types

Problem of data types
Numerical data type
Representative measurement theory
Critical analysis of data types in ABL

In this case, member functions of a data type should be input information along with the data (MMDR implements this approach). A data type is relational if it is described in terms of the set of relations (predicates). Therefore, we can identify the attribute as an attribute of the strong relational data type.

Similarly we can define and identify as an attribute of the weak relational data type. Using the set of homomorphisms we can define the notion of permissible transformations and the data type (scale types).

Physical data types in physical problems. Multidimensional data contain only physical quantities and the learning task itself belongs to

Empirical axiomatic theories: empirical contents of data

Definitions
Representation of data types in empirical axiomatic theories The first step of the analysis of empirical content of data consists of the
Discovering empirical regularities as universal formulas

They also satisfy a stronger condition of meaningfulness of the methods – the interpretability in terms of empirical systems and the system of notions of domain knowledge. The concept of empirical axiomatic theory is a formal representation of the “empirical content of data”. We say that an empirical axiomatic theory has an empirical interpretation, if all parts of that theory are interpretable in the domain theory (back- .. predicates from V and W, and axioms S. The concept of empirical system can be defined in terms of empirical axiomatic system as a non-reducible model [Pfanzagl, 1971] of the set of axioms.

Suppose that {R1,…,Rk } is a set of the most common numerical relations and some (relations ) have an empirical interpretation. In this way we obtain the set of empirical predicates In these terms, protocol of observations of the predicates from V on the set of objects A is the relational structure.