• Tidak ada hasil yang ditemukan

Chapter 4: Research Methodology

4.3 Technical analysis based approach

4.3.3 Long Short-Term Memory (LSTM) model

The ARIMA (0,0,q) or MA is specified in Eq. (4.12) below.

𝑦𝑑 = πœ‡ βˆ’ πœƒ1πœ€π‘‘βˆ’1βˆ’ πœƒ2πœ€π‘‘βˆ’2βˆ’ … βˆ’ πœƒπ‘žπœ€π‘‘βˆ’π‘ž + πœ€π‘‘ (4.12)

where π‘ž is the orders of the ARIMA (0,0,q) model, πœƒπ‘— (j = 1, 2, …., q) are the finite weight parameters, πœ‡ is the mean of the time series. In MA methodology, the time series is dependent only on π‘ž number of past random terms and the current random error term πœ€π‘‘. The ARIMA (p,0,q) or autoregressive moving average [ARMA (p,q)] is represented in Eq. (4.13), below.

𝑦𝑑= πœƒ0+ πœ™1π‘¦π‘‘βˆ’1+ πœ™2π‘¦π‘‘βˆ’1+ β‹― + πœ™π‘π‘¦π‘‘βˆ’π‘+ πœ‡ βˆ’ πœƒ1πœ€π‘‘βˆ’1βˆ’ πœƒ2πœ€π‘‘βˆ’2βˆ’ β‹― βˆ’ πœƒπ‘žπœ€π‘‘βˆ’π‘ž+ πœ€π‘‘ (4.13)

where ARMA (p, q) model depends on 𝑝 past values of itself and π‘ž past random terms πœ€π‘‘. ARIMA (𝑝, 𝑑, π‘ž) is finally the general time series model, which has been differenced 𝑑 number of times. Three critical steps, namely model recognition, model estimation, and model testing, must be carried out in the ARIMA model construction process. The estimation of 𝑝, 𝑑, π‘ž in the ARIMA model construction process is critical and must be done several times to capture the most robust model ([21], [117]). To determine the parameters of the optimal model, two diagnostic plots, the ACF and PACF, are generated from the training data and tallied with actual or theoretical values ([255]–[257]). For model identification, AIC is planned to be used ([134], [135], [255], [256]).

time series forecasters and its predictive precision and superiority in time series forecasting.

Hence, my study incorporates the LSTM methodology as a DL technique.

The Long Short-Term Memory (LSTM) architecture proposed by Hochreiter and Schmidhuber [25] is an extension of RNNs. It is considered that a capable model efficiently addresses the RNN’s vanishing gradient problem. With this architecture, the trail of the information is monitored and controlled through various gated cells, which is LSTM’s memory; thus, LSTM can efficiently monitor and manage the input flow.

The DL LSTM architecture captures the imperative features from the inputs and can retain these features over a long period. It can quickly learn and decide how long to retain and preserve the information and subsequently delete it when it is trivial. The weights assigned to the information during the training process determine whether to preserve or delete the features.

Figure 4.1 shows the framework of the LSTM model. The principal characteristic of the LSTM structures is the cell state (𝑐𝑑) which is regulated by three gates: the forget gate, the input gate, and the output gate. This architecture can decide what new information to learn and retain, how long to keep it and when to forget it through the configuration of gate controls and the memory cells.

Figure 4.1 The architecture of Long Short-Term Memory (LSTM) cell

where 𝑐𝑑 is the cell state, β„Žπ‘‘ is a hidden state, 𝜎 represents the sigmoid activation function, π‘‡π‘Žπ‘›β„Ž represents the hyperbolic tangent activation function, refers to the element-wise product and refers to the concatenation operation.

Forget gate:

A sigmoid function that acts as the forget gate layer is usually established to decide what features to be retained and what information to be eliminated from the LSTM memory. The values of 𝑦𝑑 (the current input) and β„Žπ‘‘βˆ’1 (last hidden state) are the fundamental principles for the β€œforget gate” decision. Let 𝑓𝑑 denotes the output vector of the β€œforget gate”

with the values ranging from 0 to 1 reflecting the β€œdegree of forgotten”, specified in Eq. (4.14).

If the output of forget gate is 0, the entire information is deleted from the cell state (𝑐𝑑).

Complete information will be preserved if the output of forget gate is 1. Thus, the forget gate administers which features of the cell state vector π‘π‘‘βˆ’1 will be forgotten.

𝑓𝑑 = Οƒ(π‘€π‘“βˆ™ [β„Žπ‘‘βˆ’1, 𝑦𝑑] + 𝑏𝑓) (4.14)

Where 𝑓𝑑 refers to the value of forget gate, 𝑏𝑓 is a constant term representing the bias value, and 𝑀𝑓 and 𝑏𝑓 define the set of trainable parameters of the forget gate.

Input gate:

This gate determines which new information should be retained and added to the LSTM memory. The input gate consists of two layers, as specified in Eq. (4.15) and Eq.

(4.16).

(a) a sigmoid layer that determines what type of features should be updated is denoted as 𝑖𝑑 (input gate vector), in Eq. (4.15). 𝑖𝑑 is an output variable with values ranging from 0 to 1, and 𝑀𝑖 and 𝑏𝑖 are the trainable parameters for the input gate.

𝑖𝑑= Οƒ(π‘€π‘–βˆ™ [β„Žπ‘‘βˆ’1, 𝑦𝑑] + 𝑏𝑖) (4.15)

(b) a tanh layer creating a vector of new candidate values that will be added into the LSTM memory and denoted as 𝑐̃𝑑 In Eq. (4.16).

𝑐̃𝑑 = tanh(π‘€π‘βˆ™ [β„Žπ‘‘βˆ’1, 𝑦𝑑] + 𝑏𝑐) (4.16)

Where 𝑐̃𝑑 is a vector of new candidate values that will be added to the LSTM memory cell at time t, tanh is the hyperbolic tangent and 𝑀𝑐 and 𝑏𝑐 are the trainable parameters.

A combination of 𝑖𝑑 and 𝑐̃𝑑 layers generate an update to the LSTM memory, which is outlined in 𝑐𝑑 (cell state at t), specified in Eq. (4.17).

𝑐𝑑= π‘“π‘‘βˆ— π‘π‘‘βˆ’1 + 𝑖𝑑 βˆ— 𝑐̃𝑑 (4.17)

In the first section of 𝑐𝑑, the current value is forgotten with an element-wise multiplication of forget gate layer output (𝑓𝑑) and last cell state (π‘π‘‘βˆ’1) and in the second section of 𝑐𝑑, the new candidate value is added with an element-wise multiplication of 𝑖𝑑 and 𝑐̃𝑑. The symbol βˆ— is used to represent the element-wise product. Forget gate results (𝑓𝑑) vary between 0 and 1, where 1 refers to complete retention of the value and 0 means complete refusal of the value.

Output gate:

Through a sigmoid layer, the output gate (π‘œπ‘‘), which is the focus vector, first decides what part of the LSTM memory contributes to the output. π‘œπ‘‘ is a vector comprising the values ranging from 0 to 1. Coefficients 𝑀0 and 𝑏0 are the trainable parameters of the output gate. π‘œπ‘‘ in Eq. (4.18) is determined based on the information of the last hidden state (β„Žπ‘‘βˆ’1) and input (𝑦𝑑).

π‘œπ‘‘= 𝜎(𝑀0[β„Žπ‘‘βˆ’1, 𝑦𝑑] + 𝑏0) (4.18)

A nonlinear tanh function is carried out to plot the values between -1 and 1. Subsequently, to determine the hidden state β„Žπ‘‘, an element-wise multiplication of tanh activation layer of 𝑐𝑑 and

π‘œ

𝑑 is performed, which is shown in Eq. (4.19).

β„Žπ‘‘= π‘œπ‘‘ βˆ— tanh(𝑐𝑑) (4.19)