DL Architectures - Deep Learning (DL) Models

Chapter 3: Appraisal of Financial Time Series Forecasting

3.3 Technical Analysis

3.3.3 Deep Learning (DL) Models

3.3.3.2 DL Architectures

Several DL architectures were formulated based on the salient characteristics of the input data.

They are Convolutional Neural Network (CNN), Deep Neural Network (DNN), Recurrent

Neural Network (RNN), Long Short Term Memory (LSTM), Gated Recurrent Units (GRU), Generative Adversarial Networks (GAN) and Deep Reinforcement Learning (DRL) are examples of DL models [194] and [198].

3.3.3.2.1 Convolutional Neural Network

Convolutional Neural Network (CNN) architecture was first proposed by Fukushima [199] and subsequently adopted by LeCun et al. [200], who improved the CNN architecture with an application of a gradient-based learning algorithm. CNN was initially utilised for two- dimensional or three-dimensional images; however, this structural architecture is suitable for univariate time series prediction [201]. LeCun et al. [200] successfully applied the state-of-the- art CNN model for handwritten digit classification problems. This work influenced other researchers to develop CNN further. [202]–[204] are a few examples of using the state-of-the- art CNN model. In this vein, [205]–[207] and so forth have applied CNN for financial time series prediction.

However, researchers have not yet applied the CNN architecture to analyse the New Zealand financial markets. Thus, this is a potential exploration for future researchers.

3.3.3.2.2 Deep Neural Network

Deep Neural Network (DNN) architectures have been applied to many fields such as natural language processing, speech recognition, vision analysis, social network analysis, classification matters and time series prediction [198] and [208]. In light of this, [209]–[213]

have used DNN for financial time series forecasting.

However, researchers have not yet applied DNN architecture to analyse the New Zealand financial markets, which is an empirical research gap.

3.3.3.2.3 Recurrent Neural Network

Recurrent Neural Network (RNN) architecture is a class of ANN models that evolved during the 1980s. Researchers such as Rumelhart, Hinton & Williams [214] in 1986, Hopfield [215]

in 1984 and Cohen & Grossberg [216] in 1983 have made significant contributions to RNN.

The architectures of RNN have been applied to learn sequential or time-varying patterns. RNN architectures have been used in music, text mining, and motion capturing [217]. Scholars have also attempted to use RNN models for financial time series predictions. For example, [189], [218]–[220] have used RNN architecture for financial time series prediction.

However, no attempts have yet been made to test the predictive efficiency of RNN models applied to the New Zealand financial markets, and this is a research gap.

3.3.3.2.4 Gated Recurrent Units (GRU)

Gated Recurrent Units (GRU), a variant of the RNN proposed by Cho, Merrienboer, Bahdanau,

& Bengio [221], can devise each recurrent unit to capture dependencies of different timescales adaptively. GRU requires gating units similar to the LSTM, but it requires fewer parameters than the LSTM. GRU has been applied in speech recognition, natural language processing, and machine translation [222]. Recently GRU has been used in the financial time series prediction.

For example, [219] evaluated GRU, RNN and LSTM. [220] evaluated GRU with LSTM, Simple Recurrent Neural Network (SRNN) and Multi-Layer Perceptrons (MLP). [223]

examined the effectiveness of GRU and GRU & Support Vector Mechanics (GRU-SVM).

But, GRU application to the New Zealand financial markets is yet to come and is identified as an existing research gap to fill.

3.3.3.2.5 Generative Adversarial Networks (GAN)

GAN, a comparatively newer model, was proposed by Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, and Bengio [224] in 2014, consists of a generative model and

a discriminative model. A generative model is responsible for producing data, whilst a discriminative model is built-in to judge the overall quality of the generated data and provide feedback to the generative model [225] and [226].

In the recent past, attempts have been made to apply GAN in financial time series modelling. For example, [226] examined the GAN model on S&P 500 individual firms from January 1952 to November 2016. A model named by [226] as “FIN-GAN” was developed in a data-driven manner, and data normalisation was carried out. Like GANs, FIN-GAN has two neural networks, the generator (G) and the discriminator (D). A generative component is constructed to take the 100-dimensional Gaussian noise Z as the input and to output the N = 8192 steps time series. Generator (G) is responsible for the generation of data, whilst discriminator (D) functions to judge the quality of the generated data and provide feedback to the generator (G). Their findings confirmed the electiveness of the estimated FIN-GAN architecture for financial time series modelling, satisfying the significant, stylised facts and the overall capacity of GANs to model other complex systems. [227] adopted the GAN-FD framework for stock market forecasting with high-frequency data. With 13 technical indices as input data, the GAN-FD model had better predictability than the other benchmark methods evaluated.

GAN application to the New Zealand financial markets has not yet been accomplished and is identified as a current research gap to be filled.

3.3.3.2.6 Long Short-Term Memory (LSTM)

RNN structures cannot efficiently handle and retain information about past inputs for a more extended period, mainly due to the problems of Vanishing Gradients (Hochreiter, Bengio, Frasconi, & Schmidhuber [228]). Vanishing Gradients (VG) occurs when the information about the input (gradient) passes through multiple layers; thus, the information disappears

(vanishes) by the time it reaches the end or beginning layer. Essentially with the VG problem, the RNN algorithm allocates smaller values to the weight matrix (this matrix is used for RNN training purposes, thus, impacting the RNN algorithm to curtail (or even stop) the learning process. Thus, the training of RNN to portray the long-term dependences of larger time series become inefficient and ineffective. To avoid VG issues and efficiently train the long-term dependencies inherent in the time series, variations to RNN have been established.

The LSTM architecture proposed by Hochreiter and Schmidhuber [25] extends RNN and is a capable model for efficiently addressing the vanishing gradient problem. With the LSTM architecture, the trail of the information is monitored and controlled through various gated cells, which is LSTM’s memory; thus, LSTM can efficiently manage the input flow.

3.3.3.2.6.1 Application of LSTM on Financial market prediction

The LSTM architecture has been applied to the financial time series prediction, and [189], [218]–[220], [229, 230], [231]–[240] et al. are some contemporary examples.

[189] tested the forecasting effectiveness of DL architectures, namely MLP, Recurrent Neural Networks (RNN), LSTM and Convolutional Neural Networks (CNN), and compared them with the linear ARIMA model. Three stocks were chosen from the National Stock Exchange of India Limited (NSE). Two top stocks in the New York Stock Exchange (NYSE) were also evaluated to verify the results. The sample data was normalised, and MAPE was used to measure the accuracy of all the tested models. The results suggested that all the tested models, including ARIMA, could predict, but DL models are better than the univariate linear ARIMA. Overall, CNN was adjudged as the best out of the architectures examined.

[218] appraised three DL architectures, RNN, LSTM and CNN, and compared them with ARIMA. A sliding window approach was applied with the window size fixed at 100 minutes

(overlapping 90 minutes of information with 10 minutes of prediction). Data normalisation was carried out, and all the models tested were trained for 1000 epochs. RMSE was used to determine the best model. CNN has marginally outperformed the other two nonlinear models and the ARIMA model.

[219] assessed the determination of the best predictor out of RNN, LSTM and GRU. The train and the test data were subjected to normalization. One month ahead forecast results confirmed that the LSTM model outclasses the others in terms of the hit ratio of one-month ahead forecasts. However, when a different time horizon was evaluated, LSTM-based predictors showed poor predictive accuracy.

[220] examined the forecasting effectiveness of Multi-Layer Perceptrons (MLP), Feedforward, Simple Recurrent Neural Networks (SRNN), and Long Short Term Memory (LSTM) architectures. Keras Deep Learning library was used to build and train neural networks on the Windows platform. The accuracy of each evaluated model is measured with MAD and MAPE. Their results revealed that the MLP model produces the best results compared to other models tested. Their analysis stated that data from only the past two days were selected for input. RNN and LSTM models would produce the best results if the number of past days was increased.

[229] examined the predictive efficiency of the LSTM architecture applying to the stocks in the Chinese stock market (Shanghai and Shenzhen). Data normalisation was used, and CentOS 7, Theano and Keras were utilised as the DL platform. They found the power of the LSTM model in the stock market prediction. Further, they discovered that the predictive efficiency of LSTM increased significantly when the Shanghai Stock Exchange (SSE) Index was used in comparison to the individual stocks.

Using LSTM architecture, [230] analysed the returns for the ‘BRD’ index listed in Romanian Capital Market. The chosen sample enabled them to examine the forecasting efficiency of the LSTM model, especially during the 2007 financial crisis. Each day's opening, highest, lowest, and closing price were given to the network in the form of logarithmic returns, and the network predicted the next day's return. The assessment of the predictive power was made based on RMSE and MAE. Their results confirmed the predictive efficiency of the LSTM model.

[231] examined the predictive efficacy of the Convolutional Neural Network and Long Short Term Memory (CNN-LSTM) Neural Network model. The results revealed that the CNN- LSTM Neural Network model is efficient in prediction.

[232] appraised a sequential learning model to predict a single stock price with the corporate action event information and Macro-Economic indices using Long Short Term Memory Recurrent Neural Network (LTSM-RNN) method. Keras was implemented on the front end, and Tensorflow was executed on the back end for the learning framework. The learning rate by Adaptive Moment Estimation (ADAM) was used to optimize the stochastic gradient descent method (Brownlee [241] and Arratia [242]). The findings of Minami’s study confirmed the predictive accuracy of the proposed LSTM model.

[233] investigated the predictive power of the emotional analysis based LSTM model.

Naïve Bayesian is first used to capture previous emotional data. Then they are combined with the actual behaviour data as the training data to apply the LSTM model for prediction. To test the efficacy of LSTM, the researchers used the stock exchange, Shanghai Composite Index and emotional data as input variables to predict the stock opening price. The stock post of Eastmoney was used to capture the emotional information in the network public opinion data space. MSE is used to evaluate the efficiency of the devised model, and the model is compared

with RNN and MPL. The researchers discovered that the LSTM model is more effective in learning long-term dependencies.

[234] evaluated the predictive power of the LSTM model tested on the S&P 500 Index.

The devised LSTM algorithm is developed with Keras on top of the Google TensorFlow library. The training was done with rolling windows on the ﬁrst 750 days, and trading was performed with the trained parameters on the last 250 days fully out-of-sample. To compare the predictive efficiency of LSTM, DNN, Logistic Regression Classifier (LRC), and Random Forest (RAF) models were examined. The study discovered that the LSTM methodology was inherently suitable for this domain and superior to RAF, LOG and DNN.

[235] appraised the Bidirectional LSTM (BLSTM) and Stacked LSTM (SLSTM) architectures for Short-term and Long-Term stock market prediction. The model was developed in Python using Keras with Tensor Flow backend. BLSTM, SLSTM, LSTM and MLP models were tested. The predictive accuracy of these models is empirically tested using MSE, RMSE and Coefficient of Determination (R²). Both BLSTM and stacked LSTM networks produced a better performance for predicting short-term prices than long-term predictions. Overall, BLSTM networks demonstrated better performance and convergence for short-term and long-term forecasts.

[236] used a single-layer LSTM and deep LSTM model to forecast three indices. The conventional Autoregressive Moving Average and Glosten Jagannathan Runkle Generalised Autoregressive Conditional Heteroskedasticity (ARMA-GJRGARCH) model compared with the effectiveness of the LSTM models. The Sum of Squared Residuals (SSR) was used to test the predictive power of each tested model and gathered that SSR values of the models evaluated were not significantly different for the indices tested. The investigation confirmed that LSTM results are similar to the tested conventional time series model [ARMA-GJRGARCH] when a

regression approach is considered. Although it was possible to predict the directions of the Swedish stock market, forecasting the directions of the US and Brazilian stock markets was almost impossible. The findings confirmed that the American and Brazilian markets exhibited a weak form of the efficient market hypothesis, but it did not hold for the Swedish market.

These findings implied that the US and Brazilian stock markets were more data-driven than the Swedish market.

[237] empirically tested Ordinary Linear Mode (OLM), Generalized Linear Model (GLM), and LSTM-RNN compared to the Martingale Baseline and applied them to the S&P 500 Index. RMSE and MSE were used to assess the forecasting performance. When the models were empirically tested, they found that LSTM-RNN was superior to OLM and GLM but could not consistently beat the Martingale. Although GLM needs fewer assumptions, it was unable to beat Martingale.

[238], [239] evaluated the forecasting capabilities of ARIMA and LSTM models. For the algorithm analysis, Keras library along with Theano was used. Adjusted Close price series of Nikkei 225 Index, NASDAQ Composite Index, Hang Seng Index, S&P 500 Commodity Price Index, and DJIA Index were analysed using ARIMA and LSTM models. RMSE was used to measure the predictive effectiveness. Their study discovered that LSTM was superior to ARIMA as the LSTM-based algorithm improved the prediction by 85% on average compared to ARIMA. Additionally, they found no model improvement when the number of epochs is changed.

[240] evaluated the predictive efficacies of two LSTM models, namely deep LSTM with Embedded Layer LSTM (ELSTM) and Automatic Encoder based LSTM (AELSTM) and compared them with Deep Belief Networks (DBN), Multi-Layer Perceptron (MLP) and Deep Belief Networks-Multi-Layer Perceptron (DBN-MLP). They used the embedded layer and the

automatic encoder for data vectorisation. Shanghai A-share composite index data (opening price, highest price, lowest price, closing price, and volume) were used to predict the index.

Also, additional indicators (daily amplitude, five-day amplitude, ten-day amplitude, and amplitude of fluctuation of the volumes) amounting to nine inputs were used to predict the price and trend of a single stock. Data normalisation was applied, MATLAB for the automatic encoder, and Java to process the experimental dataset. Theano in Python was used for the analysis. MAE was used to measure the effectiveness of predictions of the reviewed models.

Their results discovered that the ELSTM is the best prediction model, followed by the AELSTM, the next best. Two randomly chosen companies were used to verify these conclusions. The findings of the individual stocks showed approximately similar results to the index, confirming the superiority of ELSTM and AELSTM.

3.3.3.2.6.2 LSTM models applied to the New Zealand stock market

The above review confirms that the LSTM methodology has been widely used in financial time series prediction and is considered as one of the efficient forecasting models. However, researchers have not yet applied LSTM architecture to analyse the New Zealand financial markets, an empirical research gap I discovered.

Thus, my research intends to develop two prediction models using the LSTM architecture to ascertain foresting effectiveness of the LSTM architecture. Constructed LSTM algorithms will be applied to the NZX 50 Index and tested in three sample periods with varied time series characteristics. Further, the models I redesigned and implemented to forecast the NZX 50 Index are subsequently applied to the Australian stock market index to ascertain whether they are effective prediction models in different time series.

Dalam dokumen Critical comparison of statistical and deep learning models applied to the New Zealand Stock Market Index (Halaman 63-73)