The very complex and challenging problem of forecasting financial time series requires specific methods of data mining. Numerical relational data mining methods are especially important for financial analysis where data commonly are numerical financial time series.
The scope and methods of the study
- Introduction
- Problem definition
- Data mining methodologies
- Parameters
- Problem ID and profile
- Comparison of intelligent decision support methods
- Modern methodologies in financial knowledge dis- covery
- Deterministic dynamic system approach
- Efficient market theory
- Fundamental and technical analyses
- Data mining and database management
- Data mining: definitions and practice
- Learning paradigms for data mining
- Specific cases or experiences applied to new situations by matching known cases and experiences with new cases (instance-based learning,
- Rules in first-order logic form (Horn clauses as in the Prolog language) (analytic learning paradigm)
- A mixture of the previous representations (hybrid paradigm)
- Intellectual challenges in data mining
Relational Data Mining combines recent advances in such areas as Inductive Logic Programming (ILP), Probabilistic Inference, and Representative Measurement Theory (RMT). Many other proprietary financial appli- cations of data mining exist, but are not reported publicly [Von Altrock, 1997; Groth, 1998].
Numerical Data Mining Models and Financial Applications
ARIMA Models
An autoregressive process is defined as a linear function matching p preceding values of a time series with V(t), where V(t) is the value of the time series at the moment t. The major difference between AR(p) and MA(q) models is in their com- ponents: AR(p) is averaging the p most recent values of the time series while MA(q) is averaging the q most recent random disturbances of the same time series.
Steps in developing ARIMA model
Then coefficients of the ARIMA model are estimated along with the error of the model (residual). The process can be re- peated for different p, d, and q, if the identification of the model is uncertain.
Seasonal ARIMA
These estimates are accompanied by statistical parameters like the confidence limits, the standard error of the coefficients, and the statistical significance of the coefficients. We have already mentioned that identification of p, d, and q is the most subjective step, therefore, several alternative values of p, d, and q can be used.
Exponential smoothing and trading day regression
Comparison with other methods
Independence of an expert is relatively high in comparison with neuro- fuzzy and some other methods. Other features of ARIMA and other statistical methods are on the aver- age level in comparison with alternative methods (table 1.7 in chapter 1).
Financial applications of autoregression models
On the other hand, even a very subtle forecasting result combined with an appropriate trading strategy can bring a significant profit. However, correct forecast of the sign is sufficient to form a successful buy/sell trading strategy.
Instance–based learning and financial applications
In Chapter 1 (Table 1.7) we presented comparative capabilities of differ- ent data mining methods based on [Dhar, Stein, 1997]. Excepting the feature - “ease of use of numerical data”, the probabilistic ILP methods have advantages over IBL.
Neural Networks
- Introduction
- Steps
- Recurrent networks
- Dynamically modifying network structure
An activation level for input nodes, is taken from external (envi- ronment) nodes, which do not belong to the neural network. Similarly, the out- put nodes, deliver values into some external nodes, which also do not belong to the neural network.
Neural networks and hybrid systems in finance
For example, in a buy/hold/sell trading strategy, we are much more interested in the correct forecast of stock direction (up/down) rather than the error itself. General properties of neural networks in comparison with requirements of stock price forecast are shown in tables 1.5 and 1.6 in chapter 1.
Recurrent neural networks in finance
This is a set of meaningful symbolic rules extracted from a time series with a significant noise level. The recurrent neural network combined with extracting deterministic fi- nite state automata and discrete Markov process form a hybrid approach in data mining.
Modular networks and genetic algorithms
- Mixture of neural networks
- Genetic algorithms for modular neural networks
The next set of examples (“next generation of genotypes”) is generated using these values and transitional operators. In [Oliker, 1997] each new generation is represented by its set of weights and connectivities within the neural network.
Testing results and complete round robin method
- Introduction
- Approach and method
- Multithreaded implementation
- Experiments with SP500 and neural networks
Here is the actual target value for and is the target value fore- cast delivered by discovered model J, i.e., the trained neural network in our case. Each subprocess can be matched to learning an individual neural network or a group of the neural networks.
Expert mining
The idea of the approach is to represent the questioning procedure (interviewing) as a restoration of a monotone Boolean function interactively with an “oracle” (expert). In the experiment below, even for a small number of attributes (5), using the method based on monotone Boolean functions, we were able to restrict the number of questions to 60% of the number questions needed for complete search. In particular, in one of the tasks by using monotonicity and the hierarchy, the maximum number of questions needed to restore the monotone Boolean functions was reduced first to 72 questions and then further reduced to 46 questions using the Hansel lemma.
A minimal dynamic sequence of questions means that we reach the minimum of the Shannon Function, i.e., the minimum number of questions required to restore the most complex monotone Boolean func- tion of n arguments. Columns 2, 3 and 4 in table 2.13 present values of the three functions and of five arguments chosen for this example.
Iterate until finished)
- Interactive Learning of Monotone Boolean Functions
- Basic definitions and results
- Algorithm for restoring a monotone Boolean function
- Construction of Hansel Chains
Hansel chains are derived independently of the particular applied problem, they depend only on the number of attributes (five in this case). A binary vector x of length n is said to be an upper zero of a function and, for any vector y such that we have Also, the term level represents the number of units (i.e., the number of the elements) in the vector x and is denoted by U(x). An upper zero x of a function f is said to be the maximal upper zero if for any upper zero y of the function f [Kovalerchuk, Lavkov, 1984].
In terms of machine learning, the set of all maximal upper zeros represents the border elements of the negative patterns. Restoration algorithms for monotone Boolean functions which use Hansel's lemma are optimal in terms of the Shannon function.
Rule-Based and Hybrid Financial Data Mining
Decision tree and DNF learning
- Advantages
- Limitation: size of the tree
- Constructing decision trees
This way of discovering creates an important advantage of decision tree learning -- discovered rules are consistent. There are many consistent sets of rules, which cannot be represented by a relatively small single decision tree [Parsaye, 1997]. Rule Rl and its equivalent decision tree in figure 3.1 belong to the relatively restricted language of propositional logic.
The decision tree in figure 3.2 delivers a complete solution; therefore, the close world assumption is satisfied for this tree. It is a special property of decision tree methods that all rules have a specific DNF form, i.e., all AND clauses (con- junctions) include a value of the attribute assigned to the root.
Predictive accuracy and 2. Comprehensibility
- Ensembles and Hybrid methods for decision trees
- Decision tree and DNF learning in finance
- Decision-tree methods in finance
- Extracting decision tree and sets of rules for SP500
- Sets of decision trees and DNF learning in finance
- Extracting decision trees from neural networks
- Approach
- Trepan algorithm
- Extracting decision trees from neural networks in fi- nance
- Predicting the Dollar-Mark exchange rate
- Comparison of performance
- Probabilistic rules and knowledge-based stochastic modeling
- Probabilistic Networks and Probabilistic Rules
- The naïve Bayes classifier
- The mixture of experts
A fragment of the extracted decision tree (the first 9 of 25 levels) is pre- sented in Figure 3.6. There are only five completely correct rules (5.8%) out of the 85 rules extracted from the decision tree. We have already discussed one of the generalization methods – the m-of-n form of a decision tree.
The fidelity parameter measures how close the fore- cast of the decision tree is to the forecast by the underlying neural network. Remember that these authors measure compre- hensibility of the decision tree by the number of nodes.
Compute conditional probability where is an expert;
Combine experts according to their probability distribution
- The hidden Markov model
- Uncertainty of the structure of stochastic models
- Knowledge-based stochastic modeling in finance
- Markov chains in finance
- Hidden Markov models in finance
This rule is a combination of the bold cells in table 3.17 where the IF-part is taken from the first column. Similarly, the bold cell on the first row is used as the THEN-part of the rule. The probability 0.6 can be found in the inter- section of the respective row and column.
The goal of predicting a probability distribution is significantly different from the typical goal in fi- nance -- predicting the next value of the time series. The probability distri- bution delivers a wider picture of the possible future of the stock market.
Relational Data Mining (RDM)
Examples
In this section, several examples illustrate the difference between rela- tional and attribute-value languages used in Data Mining. In an attrib- ute-value language, objects are described by tuples of attribute-value pairs, where each attribute represents some characteristic of the object, e.g., share price, volume, etc. Neural networks and many other attribute-value learning systems have been used in financial forecasting for years (see chapter 3).
The first one is based on numerical expressions and the second one is based on logical expressions and operations. Neural networks and autoregression methods (chapter 2) exemplify the first type of methods based on numerical representation and DNF methods (chapter 3) exemplify the second type based on logical expressions.
IF stock price today is more than $60 and trade volume today is greater than 900,000 THEN tomorrow stock will go down.
- Relational data mining paradigm
- Challenges and obstacles in relational data mining One of the major obstacles to more effective use of the ILP methods is
- Theory of RDM
- Data types in relational data mining
- Relational representation of examples
- First-order logic and rules
- Background knowledge
- Arguments constraints and skipping useless hypotheses
- Initial rules and improving search of hypotheses
- The literal (or conjunction of literals) with the maximum gain is added to the end of the current clause (start clause can be null)
- Relational data mining and relational databases
- Algorithms: FOIL and FOCL
- Algorithm MMDR
A list of components (attributes) requires the length of the list and the set of types of components. The meaning of the rule for a k-arity predicate is the set of k-tuples that satisfy the predicate. This old variable can be in either the head or the current body of the rule (Horn clause).
This requires computing the information gain for each variablization of each predicate P. The informa- tion gain metric used by FOIL is. and are the numbers of positive and negative tuples before adding the literal to the clause,. and are the numbers of positive and negative tuples after adding the literal to the clause, and. is the number of positive tuples before adding the literal that has at least one corresponding extension in the positive tuples after adding the literal [Quinlan, 1990]. The branching factor grows exponentially in the arity of the available predicates and the predicate to be learned.
Rule (4) is called a regularity if it satisfies the following conditions
It is well known that the exhaustive search of all rules (4) for finding the set of BST rules is practically impossible if predicates with more than one variable are used. Therefore, the search should be constrained to a lesser set of rules, but still allowing the discovery of BST rules.
If rule R is a BST rule for data D, then rule R is regularity on data D [Vityaev, 1992]
- Fisher test
- MMDR pseudocode
- Comparison of FOIL and MMDR
- Numerical relational data mining
- Data types
- Problem of data types
- Numerical data type
- Representative measurement theory
- Critical analysis of data types in ABL
In this case, member functions of a data type should be input in- formation along with the data (MMDR implements this approach). A data type is relational if it is described in terms of the set of relations (predicates). Therefore, we can identify the attribute as an attribute of the strong relational data type.
Similarly we can define and identify as an attribute of the weak re- lational data type. Using the set of homomorphisms we can define the notion of permissible transformations and the data type (scale types).
Physical data types in physical problems. Multidimensional data contain only physical quantities and the learning task itself belongs to
- Empirical axiomatic theories: empirical contents of data
- Definitions
- Representation of data types in empirical axiomatic theories The first step of the analysis of empirical content of data consists of the
- Discovering empirical regularities as universal formulas
They also satisfy a stronger condition of meaningfulness of the methods – the interpretability in terms of empirical systems and the system of notions of domain knowledge. The concept of empirical axiomatic theory is a formal representation of the “empirical content of data”. We say that an empirical axiomatic theory has an empirical interpreta- tion, if all parts of that theory are interpretable in the domain theory (back- .. predicates from V and W, and axioms S. The concept of empirical system can be defined in terms of empirical axiomatic system as a non-reducible model [Pfanzagl, 1971] of the set of axioms.
Suppose that {R1,…,Rk } is a set of the most common numerical rela- tions and some (relations ) have an empirical interpretation. In this way we obtain the set of empirical predicates In these terms, protocol of observations of the predicates from V on the set of objects A is the relational structure.