Chapter 4: Research Methodology
4.3 Technical analysis based approach
4.3.3 Long Short-Term Memory (LSTM) model
The ARIMA (0,0,q) or MA is specified in Eq. (4.12) below.
π¦π‘ = π β π1ππ‘β1β π2ππ‘β2β β¦ β ππππ‘βπ + ππ‘ (4.12)
where π is the orders of the ARIMA (0,0,q) model, ππ (j = 1, 2, β¦., q) are the finite weight parameters, π is the mean of the time series. In MA methodology, the time series is dependent only on π number of past random terms and the current random error term ππ‘. The ARIMA (p,0,q) or autoregressive moving average [ARMA (p,q)] is represented in Eq. (4.13), below.
π¦π‘= π0+ π1π¦π‘β1+ π2π¦π‘β1+ β― + πππ¦π‘βπ+ π β π1ππ‘β1β π2ππ‘β2β β― β ππππ‘βπ+ ππ‘ (4.13)
where ARMA (p, q) model depends on π past values of itself and π past random terms ππ‘. ARIMA (π, π, π) is finally the general time series model, which has been differenced π number of times. Three critical steps, namely model recognition, model estimation, and model testing, must be carried out in the ARIMA model construction process. The estimation of π, π, π in the ARIMA model construction process is critical and must be done several times to capture the most robust model ([21], [117]). To determine the parameters of the optimal model, two diagnostic plots, the ACF and PACF, are generated from the training data and tallied with actual or theoretical values ([255]β[257]). For model identification, AIC is planned to be used ([134], [135], [255], [256]).
time series forecasters and its predictive precision and superiority in time series forecasting.
Hence, my study incorporates the LSTM methodology as a DL technique.
The Long Short-Term Memory (LSTM) architecture proposed by Hochreiter and Schmidhuber [25] is an extension of RNNs. It is considered that a capable model efficiently addresses the RNNβs vanishing gradient problem. With this architecture, the trail of the information is monitored and controlled through various gated cells, which is LSTMβs memory; thus, LSTM can efficiently monitor and manage the input flow.
The DL LSTM architecture captures the imperative features from the inputs and can retain these features over a long period. It can quickly learn and decide how long to retain and preserve the information and subsequently delete it when it is trivial. The weights assigned to the information during the training process determine whether to preserve or delete the features.
Figure 4.1 shows the framework of the LSTM model. The principal characteristic of the LSTM structures is the cell state (ππ‘) which is regulated by three gates: the forget gate, the input gate, and the output gate. This architecture can decide what new information to learn and retain, how long to keep it and when to forget it through the configuration of gate controls and the memory cells.
Figure 4.1 The architecture of Long Short-Term Memory (LSTM) cell
where ππ‘ is the cell state, βπ‘ is a hidden state, π represents the sigmoid activation function, πππβ represents the hyperbolic tangent activation function, refers to the element-wise product and refers to the concatenation operation.
Forget gate:
A sigmoid function that acts as the forget gate layer is usually established to decide what features to be retained and what information to be eliminated from the LSTM memory. The values of π¦π‘ (the current input) and βπ‘β1 (last hidden state) are the fundamental principles for the βforget gateβ decision. Let ππ‘ denotes the output vector of the βforget gateβwith the values ranging from 0 to 1 reflecting the βdegree of forgottenβ, specified in Eq. (4.14).
If the output of forget gate is 0, the entire information is deleted from the cell state (ππ‘).
Complete information will be preserved if the output of forget gate is 1. Thus, the forget gate administers which features of the cell state vector ππ‘β1 will be forgotten.
ππ‘ = Ο(π€πβ [βπ‘β1, π¦π‘] + ππ) (4.14)
Where ππ‘ refers to the value of forget gate, ππ is a constant term representing the bias value, and π€π and ππ define the set of trainable parameters of the forget gate.
Input gate:
This gate determines which new information should be retained and added to the LSTM memory. The input gate consists of two layers, as specified in Eq. (4.15) and Eq.(4.16).
(a) a sigmoid layer that determines what type of features should be updated is denoted as ππ‘ (input gate vector), in Eq. (4.15). ππ‘ is an output variable with values ranging from 0 to 1, and π€π and ππ are the trainable parameters for the input gate.
ππ‘= Ο(π€πβ [βπ‘β1, π¦π‘] + ππ) (4.15)
(b) a tanh layer creating a vector of new candidate values that will be added into the LSTM memory and denoted as πΜπ‘ In Eq. (4.16).
πΜπ‘ = tanh(π€πβ [βπ‘β1, π¦π‘] + ππ) (4.16)
Where πΜπ‘ is a vector of new candidate values that will be added to the LSTM memory cell at time t, tanh is the hyperbolic tangent and π€π and ππ are the trainable parameters.
A combination of ππ‘ and πΜπ‘ layers generate an update to the LSTM memory, which is outlined in ππ‘ (cell state at t), specified in Eq. (4.17).
ππ‘= ππ‘β ππ‘β1 + ππ‘ β πΜπ‘ (4.17)
In the first section of ππ‘, the current value is forgotten with an element-wise multiplication of forget gate layer output (ππ‘) and last cell state (ππ‘β1) and in the second section of ππ‘, the new candidate value is added with an element-wise multiplication of ππ‘ and πΜπ‘. The symbol β is used to represent the element-wise product. Forget gate results (ππ‘) vary between 0 and 1, where 1 refers to complete retention of the value and 0 means complete refusal of the value.
Output gate:
Through a sigmoid layer, the output gate (ππ‘), which is the focus vector, first decides what part of the LSTM memory contributes to the output. ππ‘ is a vector comprising the values ranging from 0 to 1. Coefficients π€0 and π0 are the trainable parameters of the output gate. ππ‘ in Eq. (4.18) is determined based on the information of the last hidden state (βπ‘β1) and input (π¦π‘).ππ‘= π(π€0[βπ‘β1, π¦π‘] + π0) (4.18)
A nonlinear tanh function is carried out to plot the values between -1 and 1. Subsequently, to determine the hidden state βπ‘, an element-wise multiplication of tanh activation layer of ππ‘ and
π
π‘ is performed, which is shown in Eq. (4.19).βπ‘= ππ‘ β tanh(ππ‘) (4.19)