Long Short-Term Memory (LSTM) model - Technical analysis based approach

Chapter 4: Research Methodology

4.3 Technical analysis based approach

4.3.3 Long Short-Term Memory (LSTM) model

The ARIMA (0,0,q) or MA is specified in Eq. (4.12) below.

𝑦_𝑡 = 𝜇 − 𝜃₁𝜀_𝑡−1− 𝜃₂𝜀_𝑡−2− … − 𝜃_𝑞𝜀_𝑡−𝑞 + 𝜀_𝑡 (4.12)

where 𝑞 is the orders of the ARIMA (0,0,q) model, 𝜃_𝑗 (j = 1, 2, …., q) are the finite weight parameters, 𝜇 is the mean of the time series. In MA methodology, the time series is dependent only on 𝑞 number of past random terms and the current random error term 𝜀_𝑡. The ARIMA (p,0,q) or autoregressive moving average [ARMA (p,q)] is represented in Eq. (4.13), below.

𝑦_𝑡= 𝜃₀+ 𝜙₁𝑦_𝑡−1+ 𝜙₂𝑦_𝑡−1+ ⋯ + 𝜙_𝑝𝑦_𝑡−𝑝+ 𝜇 − 𝜃₁𝜀_𝑡−1− 𝜃₂𝜀_𝑡−2− ⋯ − 𝜃_𝑞𝜀_𝑡−𝑞+ 𝜀_𝑡 (4.13)

where ARMA (p, q) model depends on 𝑝 past values of itself and 𝑞 past random terms 𝜀_𝑡. ARIMA (𝑝, 𝑑, 𝑞) is finally the general time series model, which has been differenced 𝑑 number of times. Three critical steps, namely model recognition, model estimation, and model testing, must be carried out in the ARIMA model construction process. The estimation of 𝑝, 𝑑, 𝑞 in the ARIMA model construction process is critical and must be done several times to capture the most robust model ([21], [117]). To determine the parameters of the optimal model, two diagnostic plots, the ACF and PACF, are generated from the training data and tallied with actual or theoretical values ([255]–[257]). For model identification, AIC is planned to be used ([134], [135], [255], [256]).

time series forecasters and its predictive precision and superiority in time series forecasting.

Hence, my study incorporates the LSTM methodology as a DL technique.

The Long Short-Term Memory (LSTM) architecture proposed by Hochreiter and Schmidhuber [25] is an extension of RNNs. It is considered that a capable model efficiently addresses the RNN’s vanishing gradient problem. With this architecture, the trail of the information is monitored and controlled through various gated cells, which is LSTM’s memory; thus, LSTM can efficiently monitor and manage the input flow.

The DL LSTM architecture captures the imperative features from the inputs and can retain these features over a long period. It can quickly learn and decide how long to retain and preserve the information and subsequently delete it when it is trivial. The weights assigned to the information during the training process determine whether to preserve or delete the features.

Figure 4.1 shows the framework of the LSTM model. The principal characteristic of the LSTM structures is the cell state (𝑐_𝑡) which is regulated by three gates: the forget gate, the input gate, and the output gate. This architecture can decide what new information to learn and retain, how long to keep it and when to forget it through the configuration of gate controls and the memory cells.

Figure 4.1 The architecture of Long Short-Term Memory (LSTM) cell

where 𝑐_𝑡 is the cell state, ℎ_𝑡 is a hidden state, 𝜎 represents the sigmoid activation function, 𝑇𝑎𝑛ℎ represents the hyperbolic tangent activation function, refers to the element-wise product and refers to the concatenation operation.

Forget gate:

A sigmoid function that acts as the forget gate layer is usually established to decide what features to be retained and what information to be eliminated from the LSTM memory. The values of 𝑦_𝑡 (the current input) and ℎ_𝑡−1 (last hidden state) are the fundamental principles for the “forget gate” decision. Let 𝑓_𝑡 denotes the output vector of the “forget gate”

with the values ranging from 0 to 1 reflecting the “degree of forgotten”, specified in Eq. (4.14).

If the output of forget gate is 0, the entire information is deleted from the cell state (𝑐_𝑡).

Complete information will be preserved if the output of forget gate is 1. Thus, the forget gate administers which features of the cell state vector 𝑐_𝑡−1 will be forgotten.

𝑓𝑡 = σ(𝑤𝑓∙ [ℎ𝑡−1, 𝑦𝑡] + 𝑏𝑓) (4.14)

Where 𝑓_𝑡 refers to the value of forget gate, 𝑏_𝑓 is a constant term representing the bias value, and 𝑤_𝑓 and 𝑏_𝑓 define the set of trainable parameters of the forget gate.

Input gate:

This gate determines which new information should be retained and added to the LSTM memory. The input gate consists of two layers, as specified in Eq. (4.15) and Eq.

(4.16).

(a) a sigmoid layer that determines what type of features should be updated is denoted as 𝑖_𝑡 (input gate vector), in Eq. (4.15). 𝑖_𝑡 is an output variable with values ranging from 0 to 1, and 𝑤_𝑖 and 𝑏_𝑖 are the trainable parameters for the input gate.

𝑖_𝑡= σ(𝑤_𝑖∙ [ℎ_𝑡−1, 𝑦_𝑡] + 𝑏𝑖) (4.15)

(b) a tanh layer creating a vector of new candidate values that will be added into the LSTM memory and denoted as 𝑐̃_𝑡 In Eq. (4.16).

𝑐̃_𝑡 = tanh(𝑤_𝑐∙ [ℎ_𝑡−1, 𝑦_𝑡] + 𝑏_𝑐) (4.16)

Where 𝑐̃_𝑡 is a vector of new candidate values that will be added to the LSTM memory cell at time t, tanh is the hyperbolic tangent and 𝑤_𝑐 and 𝑏_𝑐 are the trainable parameters.

A combination of 𝑖_𝑡 and 𝑐̃_𝑡 layers generate an update to the LSTM memory, which is outlined in 𝑐_𝑡 (cell state at t), specified in Eq. (4.17).

𝑐𝑡= 𝑓𝑡∗ 𝑐𝑡−1 + 𝑖𝑡 ∗ 𝑐̃𝑡 (4.17)

In the first section of 𝑐_𝑡, the current value is forgotten with an element-wise multiplication of forget gate layer output (𝑓_𝑡) and last cell state (𝑐_𝑡−1) and in the second section of 𝑐_𝑡, the new candidate value is added with an element-wise multiplication of 𝑖_𝑡 and 𝑐̃_𝑡. The symbol ∗ is used to represent the element-wise product. Forget gate results (𝑓_𝑡) vary between 0 and 1, where 1 refers to complete retention of the value and 0 means complete refusal of the value.

Output gate:

Through a sigmoid layer, the output gate (𝑜_𝑡), which is the focus vector, first decides what part of the LSTM memory contributes to the output. 𝑜_𝑡 is a vector comprising the values ranging from 0 to 1. Coefficients 𝑤₀ and 𝑏₀ are the trainable parameters of the output gate. 𝑜_𝑡 in Eq. (4.18) is determined based on the information of the last hidden state (ℎ_𝑡−1) and input (𝑦_𝑡).

𝑜_𝑡= 𝜎(𝑤₀[ℎ_𝑡−1, 𝑦_𝑡] + 𝑏₀) (4.18)

A nonlinear tanh function is carried out to plot the values between -1 and 1. Subsequently, to determine the hidden state ℎ_𝑡, an element-wise multiplication of tanh activation layer of 𝑐_𝑡 and

𝑜

_𝑡 is performed, which is shown in Eq. (4.19).

ℎ𝑡= 𝑜𝑡 ∗ tanh(𝑐𝑡) (4.19)

Dalam dokumen Critical comparison of statistical and deep learning models applied to the New Zealand Stock Market Index (Halaman 85-89)