The hidden Markov model - Combine experts according to their probability distribution

Rule-Based and Hybrid Financial Data Mining

4. Combine experts according to their probability distribution

3.5.4. The hidden Markov model

Each node in the Markov network can be in different states S and transition between states is governed by the Markovian law (property). This property sets relations between probabilities of transition from one state to another state:

The conditional probability of transition from a state S in one node of the network (a parent node, R) to another node (a child node, C) depends only on the parent node and does not depend upon a “grandparent” like node G,

Below figure 3.11 shows a diagram modified from figure 3.10. This diagram satisfy Markovian property because all links to “grandparent” nodes are de- leted and all connections are made time sequential.

Each block in figure 3.11 has a single input and produces a single output.

For instance, block G is associated with: (1) F -- “stock direction yesterday”, (2) its current state G -- “stock direction today” and (3) H -- “stock direction tomorrow”, with a probability distributions for all these transitions. The stock direction tomorrow (H) can be viewed as an output for node G and also as the next state after G. Such a process can be repeated daily by up- dating the value of today’s state of the stock. This iterative process is called a Markov process, where each state of the stock is associated with a set of its outputs (next states) and transition probabilities to each output state.

where is the output of the gating network for expert (weight as-

Figure 3.11. An illustrative probabilistic network for stock forecast The problem of identifying (discovering) and explicitly describing states of such a Markov probabilistic network is challenging in financial appli- cations. If states are not presented explicitly, they are called hidden states.

For instance, the state of the market can be viewed as a hidden state, because we can not measure today directly such market parameters as investor’s ex- pectations and intentions. However, we are able to observe their conse- quences such as changes in prices and trade volume. A Markov model, which is able to operate with hidden states, is called a Hidden Markov Model (HMM). In this model, each hidden state s(t) is associated with an observable output (o(t), which is generated according to the conditional output probability distribution Similarly to conventional Markov model, HMM assumes that each state s(t) moves to the next state according to the transition conditional probability distribution is Beginning from some initial state s(1) HMM will generate the sequences of observable outputs o(1),o(2),...,o(n). Such observable sequences are considered as training examples and they can be of different length n. The set of possible values of observable outputs is called an al- phabet. Each individual value from the alphabet is called a letter and a sequence of letters is called a word. Similarly, a sequence of words is called a sentence. HMM are described in [Rabiner, 1989].

These terms came from the speech recognition [Rabiner, 1989; Dietterich,

1997], where the alphabet of letters consists of “frames” of the speech sig-

nal. Let W be the set of all words in the language. We are able to observe

them, but the states generating them are hidden. Each word from W is mod- eled as a HMM.

There are two major steps in using HMM with given probability distributions and to recognize a spoken word:

–

compute the likelihood that each of the word HMM's generated that

spoken word w,

– match the most likely word with the spoken word w.

Financial time series can be interpreted similarly. Actually, this idea has been already applied in finance, see Section 3.6 and [Weigend, Shi, 1997, 1998]. To be able to follow these steps a HMM should be learned using a set of training examples. Generating of training examples assumes that vari- ables describing hidden states are identified first. Weigend and Shi [1997,1998] used unsupervised learning (clustering) to identify these variables. The general logic of HMM is show in Figure 3.12.

Figure 3.12. Hidden Markov Model diagram

The fitting networks with identified hidden variables is accomplished by several algorithms under different assumptions about statistical distributions.

For instance, the Expectation-Maximization (EM) algorithm [Dempster at al, 1976] assumes the exponential family of distributions (binomial, multi- nomial, exponential, Poisson, and normal distributions, and many others) [Dietterich, 1997]. The EM algorithm is also called the Baum-Welch or Forward-Backward algorithm in HMM.

The EM algorithm consists of two steps E and M:

1. E-step -- adding to each training example statistics describing a sequence of states that probably generated the example.

2. M-step – re-evaluating the probability distributions using the result of E- step.

For more detail see [Dempster at al, 1976; Weigend, Shi, 1998, Dietterich, 1997].

Figure 3.13. A simple dynamic probabilistic network for stock direction.

Dynamic probabilistic (stochastic) network. In the hidden Markov model, the number of values for the state variable at each point in time can become very large. Consequently, the number of parameters in the state transition probability can become intractably large. One solution is to represent the causal structure within each state by a model. For example, the hidden state can be represented by two separate relatively independent state variables: stock trade volume and volatility region (high, low). Figure 3.13 shows the resulting model. This type of models is reviewed in [Smyth et at, 1997]. They are known as a dynamic probabilistic network [DPN,

Kanazawa et al, 1995], a dynamic belief network [DBN, Dean, Kanazawa,

1989], and a factorial HMM [Ghahramani, Jordan, 1996].

Dalam dokumen DATA MINING IN FINANCE (Halaman 122-125)