Long Short Term vs Gated Recurrent Unit Recurrent Neural Network For Google Stock Price Prediction

(1)

Long Short-Term vs Gated Recurrent Unit

Recurrent Neural Network For Google Stock Price Prediction

Aida Nabilah Sadon Department of Mathematics and Statistics, Faculty of Applied Science

and Technology,

Universiti Tun Hussein Onn Malaysia 84600,Pagoh, Johor, Malaysia [email protected]

Shuhaida Ismail Department of Mathematics and Statistics, Faculty of Applied Science

and Technology,

Universiti Tun Hussein Onn Malaysia 84600,Pagoh, Johor, Malaysia

[email protected]

Nur Syahira Jafri Department of Mathematics and Statistics, Faculty of Applied Science

and Technology,

Universiti Tun Hussein Onn Malaysia 84600,Pagoh, Johor, Malaysia

[email protected] Shazlyn Milleana Shaharudin

Department of Mathematics, Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris 35900,Tanjong Malim, Perak, Malaysia

[email protected]

Abstract— Deep Learning has proven its powerful performance in many fields as it is the sub-component of Artificial Intelligence. The use of traditional statistics methods in forecasting time series are less practicality and gives less valuable prediction. The aim of this study is to propose Recurrent Neural Network (RNN) model that suitable for forecasting Google Stock Price time series data. In this study, RNN with Long Short-Term (LSTM) and Gated Recurrent Unit (GRU) architectures are proposed as predictive models known as RNN-LSTM (2), RNN-LSTM (3), RNN-GRU (2), and RNN- GRU (3). The experimental results revealed that RNN-GRU (3) was the best model with lowest error measurements of Root Mean Square Error (RMSE), Median Absolute Percentage Error (MdAPE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Mean Directional Accuracy (MDA). The proposed model showed its capability and applicability in predicting the future values of Google stock price data with good accuracy and it can be used to predict multi-step ahead values. Evident from this analysis, it is proven that the proposed RNN-GRU (3) provides a promising alternative technique in forecasting time-series data.

Keywords— RNN, LSTM, GRU, Prediction, time series forecasting, Python, hidden layers, Google, Stock Price, MDA.

I. INTRODUCTION

Stock market known as global network allows anyone to purchase an individual ownership stake, also known as shares from any public company listed in any stock exchange. The volatility of a stock price and stock market depending on various factors such as political events, current general economic situations, trading activities that controlled by traders’ expectations and etc. Researchers and investors have openly admitted the difficulties of predicting the movement and direction of stock markets due to the complex factor that influenced the stock market and each different period of stock prices contains non-linear relationship [1].

Google stocks that owned by Alphabet Inc is available to be trade publicly via stock market exchanger. Alphabet Inc is an American multinational conglomerate headquartered in California that was born after Google has been restructured

on 2^nd October 2015. To keep the ownership towards the Alphabet’s founders, the shares are divided into three classes which are class A, B and C. Stocks in classes A and C were open to public and able to be traded while shares in class B was owned by insiders of Alphabet Inc.

Prediction of stock price plays an important role in finance and economic (2). An accurate prediction of stock price movements and direction are essential to traders and investor in order to have profitable trading (3). Previous researchers showed there were an increasing demand and attention towards the prediction on changes or movements of the stock price made by the investors and traders (4).

To date, various existing methods were applied to model the stock price. However, the traditional forecasting methods unable to capture daily changes in stock price dataset. With rapid development in soft computing, the application of Machine Learning (ML) and Artificial Intelligent (AI) has been widely used in various research areas especially in forecasting.

Deep Learning (DL) was first spark by the idea to understand human mind and the concept of association [5].

Deep Learning is defined as technique that deals with a neural network that has more than two layers and known as model of supervised machine learning that divided into three phrases which are characterization, training, and prediction units [6].

As subfield in AI, DL deals with algorithm inspired from the biological structure and functioning of a brain to aid machines with intelligence [7].

Recurrent neural network (RNN) is one of the type of Deep Learning architectures, connects the output of hidden- layer neuron as an input to the same hidden-layer neuron that allows previous time-step taken [8]. RNN helps in solving various problems such as in industry of automotive &

transportation, healthcare & medicine, retail and more [5].

Since RNN has the ability of brings information from previous steps into the current step, RNNs can be used to handle the situation by using only historical information [3].

2021 2nd International Conference on Artificial Intelligence and Data Sciences (AiDAS)

(2)

Famous widely use RNN variant, LSTM network are used in modelling in forecasting time series databases and the method achieves competitive results on benchmarking datasets under evaluation. Evidently, RNN with LSTM has proved its superiority in forecasting area. Studies found out that LSTM model was able to outperform state-of-the-art univariate time series forecasting methods [9]. RNN with LSTM was proposed by using upward/downward reversal point feature sets, found that the average accuracy of prediction towards Chinese and America stock market 68.6%

and 55.2% respectively [10], National Stock Exchange (NSE) and New York Stock Exchange (NYSE) [11].

For intent identification, DL based framework using BiDirectional LSTM networks are proposed and the model able to handle non-consecutive dependency between query terms [12]. LSTM model consistently outperforms in terms of SMAPE accuracy and compete all other methods in the CIF2016 forecasting competition [9].

Known as modified RNN, GRU has performed in many fields and somehow has proven its performance in predicting state of charge (SOC) of the lithium battery [13], water quality prediction [15], indoor air quality [(Benitez). Other than that, RNN with GRU was proven its capability in predicting the financial time series [3], weather forecasting [Ding & Zhou, 2019], sentiment analysis [17] and etc.

This study are two-fold; to analyze the capabilities of LSTM and GRU in capturing dependencies that may influence stock price movements and to evaluate the proposed method on the performance in forecast the Google stock price by compare the forecast and actual Google stock price.

II. METHOD AND MATERIAL A. Preliminary Study

In this study, daily dataset of GOOGL class C, Google’s stock price that taken from global financial portal of stock exchange will be used. Data taken is from 1^st January 2015 until 31^st January 2020 for evaluation of model forecasting in this study. Variables that available in the dataset are Date, Price, Open, High, Low, Volume and Change%. Variable of Open will be used in this study to test the model of forecasting.

Firstly, the data is collected and will undergo preliminary analysis. In this step, two statistical testing are performed on the dataset namely Anderson Darling test and Augmented Dickey-Fuller test. This step is essential as the data needs to fulfil the assumption of non-normality and non-stationary.

B. Data Description

Once the dataset has proven its characteristics, the next step is data pre-processing. The data undergoes pre-processing before used to training the RNN which are LSTM and GRU.

The data was taken from 1st January 2015 until 30th January 2020 that contained 1277 observations.

Fig. 1. Proposed RNN Models

Fig. 1 showed the time series plot for Google Stock Price from 1st January 2015 until 31st January 2020. The recorded observations for Google Stock Price plotted above shown that data has increasing trend but fluctuated within for each year.

Starting from 2018, the time series started exhibits cyclical movement and random variation within in each year.

C. Experimental Setup

These experiments were testing the capability of two architecture which are LSTM and GRU with two different categories of hidden layers which are two hidden layers, and three hidden layers. The models evaluation conducted by using dataset of Google Stock Price.

D. LSTM Architechture

LSTM network contains of memory cell, block input, output activation function, gates and peephole connections as its critical components, the components including forget gate, and input gate. By assuming both gates are closed, the contents of the memory cell will remain unchanged between one time-step and the next. The structure of gate allows information to be regained across many time-steps. This condition is allowing LSTM to overcome the issue of vanished gradients [2].

Those gates of sigmoid, tanh, pointwise multiplication, pointwise addition, and vector concatenation can learn and chose which information in a sequence is important to keep or throw away. Equation (1) and (2) is block input and input gate which are functioning to receive input. Equation (3) is functioning to decide which information to be throw, while Equation (4), (5) and (6) are functioning to process the output.

E. GRU Architecture

GRU is another recurrent unit that like LSTM since it is inspired from how LSTM architecture and been considered

^ _^ _^ + _) block input (1)

^ = ^ ^ ∗ ^   input gate (2)

^ = _^ _^ _∗ ^ _ forget gate (3)

^ = ^∗ ^ ^∗ ^  cell state (4)

^ = ^ ^ ∗ ^  output gate (5)

^= ^*h(t) block output (6)

(3)

as simple in terms of computation and implementation.

Compared to LSTM, in GRU it only has two gates which is a reset gate and update gate that function like forget/input gate that contain in LSTM. The major difference between GRU and LSTM is the ability of exposing its memory cell only using leaky integration with update gate that control the adaptive time constant [2].

_ __ __ update gate (7)

_ __ __ reset gate (8)

′_ tanh__ _∗ __ candidate hidden state

(9)

 ∗ ′  1  _ ∗ _ hidden state (10)

Compared to LSTM, GRU is much simpler which only contain update and reset gates. Equation (7) will decide what information to throw and what to add up. This gate is functioning like LSTM’s forget gate, Equation (3). Equation (8) is another gate function to decide how much previous information to forget. Equation (9) and (10) are computation related to hidden state that functioning to re-fed into the model cell together with the next input data in the sequence.

As advantage, GRU has less tensor operations, therefore GRU are little more fasters than LSTM, but speed is not the matter compared to accuracy that more important to get accurate prediction.

F. Proposed Models

Architecture of LSTM and GRU will be used as proposed predictive models that will tested with two different category of hidden layers which are two and three hidden layers respectively. The proposed models are showed as in Fig 2 as follow:

Fig. 2. Proposed RNN Models

Fig. 2 showed block diagram of proposed models.

Computation of the training and testing of the models, and evaluation of the model performances are conducted by using Python programming. The setting of parameters and hyperparameters for models tuning are synchronous for all models. The models tunning as follow:

TABLE I.HYPERPARAMETER SETTING

Hyper parameter LSTM GRU

Optimizer Adam Adam

Classifier Sigmoid + tanh Sigmoid + tanh

Neurons 100 100

Dropout 200 epochs 200 epochs

Table I shows the tunning for the models setting which are optimizer, classifier, number of neurons and dropout.

III. RESULTS AND DISCUSSION

After the preliminary study, based on the statistical test result of Anderson Darling normality test and Augmented- Dickey Fuller stationary test, it is found that the dataset used are not normal and not stationary. But these both test only to see the characteristics of the dataset, the non-normal and non- stationary characteristics will not be changed or treated.

A. Preliminary Testing

Preliminary study was performed on the dataset to test the normality and stationarity of the Google Stock Price data.

Two statistical tests were used for this purposed namely Anderson Darling (AD) test and Augmented Dickey-Fuller (ADF) test. AD test was carried out to test the normality of the dataset, while ADF test was used to test stationarity of the data. These preliminary tests were crucial as the outcome from these tests were used to determine the suitability of the chosen model.

TABLE II.COMPARISON OF PERFORMANCE MEASUREMENTS

Table II shows the hypothesis of the tests for Anderson Darling Test and Augmented Dickey-Fuller Test.

Fig. 3. Statistical result of AD Test

Fig. 3 shows the statistical results of AD test obtained by using Python programming. The results showed that the

_was rejected at various significant level indicating the data is not normal. The value of each significant level is 15, 10, 5, 2, and 1 with respectively its value of critical value of 0.574, 0.654, 0.758, 0.915 and 1.089, the test found that the data does not follow the normal distribution.

Fig. 4. Statistical Result of ADF Test

Fig. 4 showed the statistical test results of ADF test obtained by Python programming. The results revealed that the dataset is non-stationary as the absolute test statistic results are smaller than the absolute critical values and

Preliminary Test Hypothesis Testing Anderson Darling

Test

: The data follow normal distribution.

: The data does not follow normal distribution.

Augmented Dickey- Fuller

: p-value > 0.05; The data are not stationary.

: p-value<= 0.05; The data are stationary.

(4)

supported by the p-value is 0.935416 which greater than 0.05 indicating the test failed to reject . Overall, the dataset was confirmed its characteristics which are not normal and non- stationary. These discoveries are crucial in determining the suitable model to be chosen in the next stage.

B. Models Performance Measurements

By using Python Programming, the models are constructed, trained, and tested by using dataset of Google Stock Price, the results of model’s performance in predicting the Google stock price are measured by using five error measurements as shown in table follow:

TABLE III.COMPARISON OF PERFORMANCE MEASUREMENTS Type of

Error

RNN-LSTM (2)

RNN-LSTM (3)

RNN-GRU (2)

RNN-GRU (3)

RMSE 96.9787 109.5249 54.6618 32.4199*

MdAPE 4.68 5.27 2.80 1.54*

MAE 73.9713 83.9073 41.9455 24.3596*

MAPE 5.71 6.48 3.25 1.92*

MDA 0.4844 0.5200* 0.5111 0.4978

*Indicates the best result among all the models.

Table III showed the results of performance measurements for all models proposed. Based on the Table, the model RNN-GRU (3) has proven to be the best model among others and outperformed in predicting the Google Stock Price value. Overall, GRU models can give good accuracy of prediction which closed to actual values. Based on visual inspection and accuracy measurements, RNN-GRU (3) has better capability in predicting the stock price than others since the prediction values are closer to the actual values and able to capture the stock price movements.

Fig. 5. Actual Vs Prediction by RNN-GRU (3)

Fig. 5 showed the plot of RNN-GRU (3) model prediction on Google Stock Price. The value of prediction is closed to the actual values which means that the models able to capture the movements and directions of the stock price.

C. Multi-step Ahead Prediction

By using the best model selected in the experiments, model of RNN-GRU (3) are used to test the multi-step ahead prediction of Google Stock Price value. The model predicted for stock price in the from 3^rd February 2020 to 16^th March 2020 and the results are shown as follow:

Fig. 6. Google Stock Price 30 days Prediction

Despite the good statistical performance showed by the RNN-GRU (3) in Table II, Fig. 6 showed the model is under forecast for the first 15 days and unable to detect the sudden decrease movement of Google stock price after 15 days. The sudden decrease of the actual stock value may cause by many factors as due to the pandemic outbreak which resulted in in early January 2020. As the daily cases of pandemic Covid-19 risen, most of the country in the world has limits their daily activities and affect all industry sectors in the world. This may give an effect of the Goggle stock price. As the world has never experience such situation, therefore DL algorithm unable to capture the recent situation and the dataset used in this study are lacking in information since there the dataset was taken from 1^st January 2015 until 31^st January 2020. The dataset used does not contain history data covid-19 pandemic.

Therefore, the model cannot be supplied with enough history to predict accurately at this time.

IV. CONCLUSION

LSTM and GRU are series of Recurrent Neural Network (RNN) with capability of bring information from previous computation or steps into the current step. There are four models being used in the study which are RNN-LSTM (2), RNN-LSTM (3), RNN-GRU (2) and RNN-GRU (3). Model with two number of hidden layers can represent an arbitrary decision boundary to arbitrary accuracy with rational activation functions and can approximate any smooth mapping accuracy while hidden layers more than two able to learn complex representation and increase accuracy of prediction.

All objectives proposed have fulfilled throughout this study. The first objective of this study is to propose deep learning with RNN algorithm for google stock price prediction. This objective if fulfilled and the models used are illustrated Fig. 1. LSTM and GRU neural network are used with difference number of hidden layers which are two and three hidden layers for each neural network.

Second objective of this study is to analyse the capability of LSTM and GRU in capturing dependencies that may influence stock movement. There are four models which RNN-LSTM (2), RNN-LSTM (3), RNN-GRU (2) and RNN- GRU (3) proposed with same setting of parameters and hyperparameters. From the comparison between predicted and actual values, LSTM models has seriously under forecast with low capability to capture the dependencies and the

(5)

movements in the stock price. This may be due to LSTM characteristics that designed to work with large of dataset and required more history to be stored in the model. While GRU models capable to capture the dependencies in the stock price and able to produce predictions that closely to actual value of stock price. Even though, characteristics of LSTM and GRU are similar, one factor that may influence GRU to outperformed LSTM is GRU is fully exposes its memory (observations) that control by update gate which is not contain by LSTM.

For the third objective, to evaluate the performance of the models proposed in forecast the Google stock price by compare the predicted and actual values. There are five different accuracy measurements used in this study which are RMSE, MdAPE, MAE, MAPE and MDA. Based on the performance accuracy measurement, RNN-GRU (3) model has proven to outperformed compared to RNN-GRU (2), RNN-LSTM (2) and RNN-LSTM (3) models. The results indicate that RNN-GRU (3) model has the lowest RMSE of 32.4199, MdAPE of 1.54, MAE of 24.3596, MAPE 1.92 and MDA 0.4978 that significant to 0.5, can be said that half of the predictions direction was correctly predicted. In conclusion, RNN-GRU (3) is proven to be the best model among the four models. As the number of hidden layers are added more than two layers, the accuracy increased present the increase of ability of model to produce prediction accurately. Even though it is the best model, performance of RNN-GRU (3) in multi-step ahead predictions are under forecast compared to actual values. Due to current situations on early February until March, the pandemic COVID-19 cases are started to become worst that affect many sectors and no exception for stock market.

As for future recommendations to conduct the study that related to Deep Learning neural network modelling, it is suggested to tune the model in terms of parameters and hyperparameters of the model itself. Experimenting with numbers of layers can help to meet the best model prediction based on dataset being used.

ACKNOWLEDGMENT

The authors would like to thank the Universiti Tun Hussein Onn Malaysia for supporting this research under MDR Grant Scheme Vot No H508.

REFERENCES

[1] Cheng, C., Chen, T., & Wei, L. (2010). A hybrid model based on rough sets theory and genetic algorithms for stock price forecasting.

Information Sciences, 180(9), 1610–1629.

[2]

Ayo, C. K. (2014). Stock Price Prediction Using the ARIMA Model.

International Conference on Computer Modelling and Simulation, pp.106 – 112.

[3] Jayanth Balaji, A., Harish Ram, D. S., & Nair, B. B. (2018).

Applicability of deep learning models for stock price forecasting an empirical study on bankex data. Procedia Computer Science, 143, 947–953.

[4] Zahedi, J., & Rounaghi, M. M. (2015). Application of artificial neural network models and principal component analysis method in predicting stock prices on Tehran Stock Exchange. Physica A:

Statistical Mechanics and Its Applications, 438, 178–187.

[5] Tensorflow, R. W., & Manaswi, N. K. (2018). Deep Learning with Applications Using Python.

[6] Arango-argoty, G., Garner, E., Pruden, A., Heath, L. S., Vikesland, P., & Zhang, L. (2018). DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data. 1–15.

https://doi.org/10.1186/s40168-018-0401-z

[7] Moolayil, J. (2019). Learn Keras for Deep Neural Networks Learn Keras for Deep.

[8] Patterson, J., & Gibson, A. (2017). Deep Learning. Sebastopol 1005 Gravenstein Higway North: O’Reily Media.

[6] Dua, M., Yadav, R., Mamgai, D., & Brodiya, S. (2020). An Improved RNN-LSTM based Novel Approach for Sheet Music Generation.

Procedia Computer Science, 171, 465–474.

[7] Duncan, W. W., Glew, M. J., Wang, X. J., Flaherty, S. P., &

Matthews, C. D. (1993). Prediction of in vitro fertilization rates from semen variables. Fertility and Sterility, 59(6), 1233–1238.

[8] De Mulder, W., Bethard, S., & Moens, M. F. (2015). A survey on the application of recurrent neural networks to statistical language modeling. In Computer Speech and Language (Vol. 30, Issue 1, pp.

61–98). Academic Press. https://doi.org/10.1016/j.csl.2014.09.005.

[9] Bandara, K., Bergmeir, C., & Smyl, S. (2020). Forecasting across time series databases using recurrent neural networks on groups of similar series: A clustering approach. Expert Systems with Applications, 140.

[10] U, J. H., Lu, P. Y., Kim, C. S., Ryu, U. S., & Pak, K. S. (2020). A new LSTM based reversal point prediction method using upward/downward reversal point feature sets. Chaos, Solitons and Fractals, 132. https://doi.org/10.1016/j.chaos.2019.109559.

[11] Hiransha, M., Gopalakrishnan, E. A., Menon, V. K., & Soman, K. P.

(2018). NSE Stock Market Prediction Using Deep-Learning Models.

Procedia Computer Science, 132, 1351–1362.

[12] Sreelakshmi, K., Rafeeque, P. C., Sreetha, S., & Gayathri, E. S.

(2018). Deep bi-directional LSTM network for query intent detection. Procedia Computer Science, 143, 939–946.

[13] Jiao, M., Wang, D., & Qiu, J. (2020). A GRU-RNN based momentum optimized algorithm for SOC estimation. Journal of Power Sources, 459, 228051.

[14] Li, W., Wu, H., Zhu, N., Jiang, Y., Tan, J., & Guo, Y. (2020).

Prediction of dissolved oxygen in a fishery pond based on gated recurrent unit (GRU). Information Processing in Agriculture.

[15] Loy-Benitez, J., Heo, S. K., & Yoo, C. K. (2020). Imputing missing indoor air quality data via variational convolutional autoencoders:

Implications for ventilation management of subway metro systems.

Building and Environment, 182, 107135.

[16]

Ding, M., Zhou, H., Xie, H., Wu, M., Nakanishi, Y., & Yokoyama, R. (2019). A gated recurrent unit neural networks based wind speed error correction model for short-term wind power forecasting.

Neurocomputing, 365, 54–61.

[17] Singh, K. (2020) First U.S coronavirus death occurred in early February in California. Retrieve on January 8, 2021 from https://www.thestar.com.my/news/world/2020/04/22/first-us- coronavirus-death-occurred-in-early-february-in-california