A Comparative Analysis of ARIMA, GRU, LSTM and BiLSTM on Financial Time Series Forecasting

(1)

2022 IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE)

A Comparative Analysis of ARIMA, GRU, LSTM and BiLSTM on Financial Time Series Forecasting

Muskaan Pirani Devang Patel Institute of Advance Technology and Research(DEPSTAR)

Charotar University of Science and Technology(CHARUSAT)

Anand, Gujarat, India 18dce097@charusat.edu.in Mohammed Husain Bohara Devang Patel Institute of Advance Technology and Research(DEPSTAR)

Anand, Gujarat, India mohammedbohara.ce@charusat.ac.in

Paurav Thakkar Chandubhai S Patel Institute of

Technology (CSPIT) Charotar University of Science and

Technology(CHARUSAT) Anand, Gujarat, India paurav.t20@gmail.com

Dweepna Garg Devang Patel Institute of Advance Technology and Research(DEPSTAR)

Anand, Gujarat, India dweepnagarg.ce@charusat.ac.in

Pranay Jivrani Chandubhai S Patel Institute of

Technology (CSPIT) Charotar University of Science and

Technology(CHARUSAT) Anand, Gujarat, India pranayjivrani@gmail.com

Abstract—Machine learning and profound learning algorithms were one in every of the effective techniques to statistical prediction. Once it involves time series prediction, these algorithms shelled classic regression-based solutions in terms of accuracy. Long short-term memory (LSTM), one of the recurrent neural networks (RNN), has been incontestable to outperform typical prediction methods. The LSTM-based models are incorporated with further “gates” such that it will consider input data of longer sequences. LSTM-based models outperform Autoregressive Integrated Moving Average models attributable to these further capabilities (ARIMA). Gated- Recurrent Unit (GRU) and bidirectional long short-term memory (BiLSTM) are extended versions of LSTM. The major question is that an algorithmic program would shell the other two by giving smart predictions with minimum error.

Bidirectional LSTMs provide extra training because it may be a 2-way formula, thus, it'll traverse the training information double (1. Left-to-right 2. Right-to-left). GRU has one gate below the LSTM architecture. Hence, our analysis is especially centred on that algorithm outperforms the opposite two and it conjointly deals with behavioural analysis of the algorithms, their comparison and therefore the standardization of hyper- parameters.

Keywords—Time Series Forecasting, ARIMA, GRU, LSTM, BiLSTM.

I. INTRODUCTION

With increment in availability of the time-series data, accurate prediction has become a necessary part of it in various areas [1]. This study guides a stepwise walkthrough for predicting certain stocks. Predicting time series (here, stock prediction) is challenging in itself as a lot of factors are included such as apart from the closing price, for instance, sudden bearishness or bullishness in the market, certain economic shock, news or internal changes of organizations.

All these factors are prone to affect the prediction. The unpredictable changes and incomplete data regarding economic trends and conditions are the major challenges for the prediction of financial time series data. Moreover, stock market is the area where one stock may boom one day and may scrap on another day which shows its volatile and unstable nature that demands for accurate and efficient prediction of data.

The conventional approach of predicting time series deals with model fitting with linear regression at first and then using moving average for the prediction. This method is referred to as the Autoregressive Integrated Moving Average (ARIMA) [2] [3] [4] which had been used largely over years and had been evolved since then. ARIMA has been developed for many years and has variations. Seasonal ARIMA (SARIMA) and ARIMA with explanatory variables (ARIMAX) are 2 examples. The performance of both the models is praiseworthy in terms of short-run or seasonal prediction similar to weather forecast, however, it will be poor in terms of long-run forecasting [5] . To overcome the drawbacks of the ARIMA model, we'll use alternative deep learning algorithms like information-driven LSTM models instead of model-driven algorithms [6]. Moreover, the adoption of deep learning approaches is gaining momentum due to the attention of giant companies like Google, Facebook, Microsoft, etc.

With relevance to the underlying context, we will train the most effective learning model. For instance, if we've got image data and also the problem is Image Recognition, then Convolutional Neural Network is preferable, but Recurrent Neural Network proves to be best since it can study the previous inputs of the sequence so performs superior analysis.

RNN-based models come in a variety of flavors, depending on their ability to remember past input data. Vanilla RNN is incapable of memorizing the previous data. These models are known as feed-forwarding models technically.

LSTM is a variety of RNN that falls under the category of feedback-based models that can remember past data. Several more gates are put into the architecture here so that it can recall previous data while forgetting irrelevant data. Because LSTM is unidirectional, it only traverses the training data once, from left to right (input-to-output). As discussed earlier, LSTM outperforms the ARIMA model [7] [8]. The major question arises here is whether we can improve our prediction by incorporating additional layers in LSTM architecture. To discover this, the paper further explores BiLSTM and GRU algorithms. BiLSTM offers the traversing of training data twice whereas GRU offers one gate less than the LSTM architecture. BiLSTM can be used for long term predictions with huge data while GRU is for long term predictions with

(2)

comparatively less data. In particular, we will compare these three algorithms and address the following questions:

1. Can prediction be improved when data is traversed in both ways?

2. Can prediction be improved when one gate is less in the architecture?

3. How different are these three architectures?

The structure of this paper is as follows: The next section deals with the tasks at hand. Section III discusses context and phrases. Section IV discusses the setup and the data. Section V contains pseudo code. Section VI shows the results of the experiments. Section VII deals with algorithm performance and hyper parameters, while Section VIII deals with conclusions and future work.

II. RELATED WORKS

Autoregressive Integrated Moving Averages are the core of ancient statistical analysis and prediction methodologies.

ARIMA models [9] [10] are classified into 2 types: seasonal Average (ARIMA) and its numerous variants, comparable to ARIMA, SARIMA and ARIMA with instructive factors. For an extended time, these methods are used to represent time series problems. These moving average-based methods, however, work wonderfully. They are doing have bound restrictions, although [11]:

With respect to the problem, these approaches fall under the category of regression-based models due to which these models are unable to explain information with non-linear associations among the features.

● When performing statistical experiments, there are certain assumptions regarding data that must be maintained in order to produce a sensible model such as the standard deviation which is constant.

● Long-term forecasting is more difficult with them.

Methods supported machine learning and deep learning have spread out new avenues for evaluating statistical knowledge. In [12], authors used a variety of predicting algorithms to simulate the S&P 500, as well as deep learning, random forests and gradient-boosted trees. In line with K. et al., tuning neural networks and deep networks was additionally difficult. In [13], the authors planned an RNN- based forecasting method for stock market returns. The goal was to come up with portfolios by modifying the inner layers of the RNN to change the return threshold levels. Similar work has been finished monetary data prediction by authors of [14].

[15] and [16] are the foremost connected publications that compare the output of LSTM and BiLSTM. According to Kim and Moon, a two-way long short-term memory model supported variable statistical information outperforms a unidirectional LSTM model. In [16], authors have projected bidirectional stacking and conventional LSTM architectures in order to forecast the speed of traffic over the whole network.

III. METHODS

A. Machine Learning Approaches for time series data forecasting

Time series data forecasting is one of the areas on which many researchers are working from so long as brokers, analysts, etc. are looking for forecasting of currencies, stocks, or other financial assets on a daily basis to reduce the risk and

get profit on their investments. Machine learning offers various models which are suitable for time series data forecasting. As per the nature of the data, ML models find the hidden pattern and time correlations from the data by automatically interpreting and building the logic for the analysis so one can avoid the pre-processing and initial decomposition of the data.

The efficient ML models for short term time series data forecasting of are [17]:

● Artificial Neural Network (ANN)

● Support Vector Machine (SVM)

● Random Forest (RF)

● Gradient Boosting Machine (GBM) B. Deep Learning approaches for time series data

forecasting

Deep Learning (DL) uses the Artificial Neural Network (ANN) models to figure out the nonlinear relationships by considering multilayer processing. The accuracy and performance of DL models for the prediction of financial time series data like stoke prices, cryptocurrencies, etc seeks the attention of many industries for adoption of DL models. The various DL approaches which are suitable for sequential data forecasting are [18]:

● Deep Multilayer Perceptron (DMLP)

● Recurrent Neural Networks (RNN)

● Long Short-Term Memory Model (LSTM)

● Convolutional neural networks (CNNs)

● Restricted Boltzmann Machines (RBMs)

● Deep Belief Networks (DBNs)

● Deep reinforcement learning (DRL)

In this paper, LSTM, BiLSTM and GRU are experimented on stock data of various companies and prediction is carried out.

C. Recurrent Neural Networks (RNNs)

RNNs are a variation of classic neural networks that is feed-forward, capable of managing sequence inputs of varied lengths and used for sequential data like languages and speech. In distinction to straightforward neural feed forward networks that cannot manage series inputs and need each of their inputs and their outputs to become independent from one another, RNN models feature gates that store and use past inputs. Repeated hidden states are a style of RNN memory that enables the RNNs to predict what data are given following in an exceedingly series of input data. For whimsical drawn-out sequences, RNNs can in theory exploit earlier sequential knowledge. However, as a result of RNN memory constraints, the lifespan of the sequence data is restricted to solely a number of steps back.

Vanishing gradients are one of the most common issues with RNNs. It arises once network data is lost. When an input or gradient advances through a number of levels, it is lost and washed away as soon as the first layer ends. This problem would build it exhausting for RNNs to capture long-run dependencies that makes it implausibly troublesome to follow RNN.

(3)

A further difficulty with RNNs is known as "exploding gradients," where information on the input or gradient flows across numerous layers, and accumulates and results in a very large gradient before the end or first layer reaches it. Because of this issue, RNNs are tedious to train.

Gradient, that is outlined precisely because the first derivative of a function's output with relation to its inputs, essentially measures what proportion a function's resultant output varies in response to changes with its input values.

Within the vanishing gradients downside, the RNN training algorithmic rule sends lesser standards towards weight matrix and also the learning got terminated by the model. However, the exploding gradients problem’s training mechanism gives larger standards to the weight matrix for no apparent reason.

The gradients are often truncated/squashed to unravel this problem [19].

D. Long Short-Term Memory Model (LSTM)

RNNs, as previously stated, struggle to learn dependencies which are of long-term. The models on which LSTM is based are an RNN addition that can address the vanishing gradient problem extremely cleanly. The LSTM models increase the memory of the RNNs to allow them to maintain and acquaint themselves with long-term input dependencies. This memory extension allows them to recall knowledge for a longer amount of time, as well as read the values, write them, and erase the information as well from their memories. Hence, the LSTM memory-cells are known as the "gated" cells and are referred to as "gate" to determine whether memory data is to be stored or discarded. An LSTM model collects important properties from inputs and stores them for a lengthy period of time. The information weight values are utilized throughout the training phase to determine whether the information is removed or saved. As a consequence, an LSTM model studies which information, facts or figures are important for maintaining or discarding. As a result, an LSTM model learns which information, facts or figures are essential to keep or discard. The LSTM is useful in the areas like sentimental analysis, speech, language modeling, time series data prediction, etc.

In broad-spectrum, the LSTM model comprises 3 gates:

forget, input, and output. This forget gate decides if existing information is to be maintained or erased, the input gate calculates what proportion new information is to be supplemental to the memory & therefore the final output gate determines if the present worth within the cell weighs into output.

Forget Gate- To determine what info should be detached from the LSTM memory; we tend to use a classic sigmoid feature. This feature is usually determined by the ht-1 as well as xt settings. This gate produces ft, a number among zero and one, with zero indicating total removal of the learned worth and one indicating preservation of the whole assessment.

Input Gate - It determines if the new information is saved within the LSTM memory-cell or the gate. The gate is formed of 2 tiers: a sigmoid tier which defines what values should be adjusted and the tanh tier which produces a vector of recent LSTM memory candidate values.

Output Gate- Initially, it uses a sigmoid tier to decide which section of the LSTM memory is involved in producing output. After that, the tanh function is used to scale the values

from -1 to +1. At the end, the output is multiplied by the output of a sigmoid layer.

E. Bi-directional Long Short-Term Memory

One of the extensions of mentioned LSTM models, Deep- bidirectional LSTMs [20], can be utilized to process input files using 2 LSTM models. In LSTM, the series is fed into the initial forward layer and then the converse style of the input sequence is fed into the next backward layer of the LSTM model. Once the LSTM is employed twice, it improves the training of long-run dependencies and, as a result, the model's accuracy [3].

F. Gated Recurrent Unit

The design or the architecture of GRU is somewhat completely different therein it incorporates forget as well as input gates into one gate called the update gate. It conjointly combines the cell state and secret state, with some modifications. The key benefit of GRU over LSTM is the reduced number of parameters without sacrificing accuracy, resulting in faster convergence and a more streamlined model.

This is the key explanation why GRUs have been chosen over LSTM in current realistic scenarios.

IV. METHODS

Here, the experimental evaluation of ARIMA, GRU, LSTM, and BiLSTM in predicting financial time series is carried out in terms of the functioning, performance and efficiency.

A. Data Set

The inventory data were obtained from the Yahoo finance website from Jan 1985 till August 2018 and the data is as below:

1. IBM Stock data.

2. SNOWMAN Stock data.

3. HDFC Stock data.

4. BHEL Stock data.

For the analysis, data of IBM stocks for the duration of 10 years (July 2009-2019) are fetched.

B. Training and Test Data

The "Close" parameter was the lone element of the financial time series that was input into the ARIMA, LSTMs and its variant models, BiLSTM, and GRU. The ¾ part of each data set can be kept for training and remaining can be kept for assessing model correctness. The statistics for the explanatory variables in a time series are as shown in Table 1.

TABLE I. THE TIME SERIES DATA STUDIED

Stocks Observations

Total Train 80% Test 20%

IBM 2,571 643 3,214

SNOWMAN 1,188 298 1,486

HDFC 1,507 646 2,153

BHEL 1,477 633 2,110

Total 6,743 2,220 8,963

C. Assessment Metrics

The "loss" values are usually reportable by major DL models. A loss may be a technical phrase denoting a penalty for creating an incorrect prediction. To be specific, if the

(4)

forecast of any model is perfect, the failure value will be zero.

As a result, the target is to truncate the loss values by accumulating a group of biases or weights or both to do so.

Additionally, to failure, researchers use the Root-Mean- Square-Error (RMSE) to measure prediction accuracy that's utilized by neural networks. The RMSE evaluates the distinction among actual and expected values. The below equation can be used for error computation [21]:

(1)

where yi denotes the true value, is the anticipated price or the value, and n denotes the sum of the number of observations. The most major benefit of employing RMSE is that severe inaccuracies are penalized. In addition, the scores are scaled in the similar components as the projected values.

Additionally, the % drop or reduction in RMSE is used as a change indicator, which can be computed as [8]:

ℎ % !" #$% & ! !"

$% & ! !" * 100 (2)

Fig. 1. The Algorithm Below of LSTM, BiLSTM and GRU [8] [21]

The typical “feed-forward” Artificial Neural Networks (ANN) shown in Fig. 2 enable the model to be received

training by solely movement in one direction and not taking any feedback from previous input data into consideration. In particular, ANN models go from input (left) to output (right) with no feedback from previously learned results taken into account.

V. RESULTS

As a result, the performance of any layer has no concern for the training part for that layer (i.e., no memory). These kinds of networks that have hidden units i.e. neurons, are helpful for modelling the link among the input and output features (linear and nonlinear) and thus comport equally with regression modelling. To put it another way, these networks execute purposeful mapping that converts input data to output data. This kind of neural network is usually utilized in pattern recognition. ANN models embrace standard and straightforward auto-encoder networks, yet as Convolutional Neural Networks (CNN).

Recurrent-based Neural Networks (RNNs), on the opposite hand, recall chunks of previous information employing a feedback mechanism within which training happens not solely from input to output (as feed-forward), however additionally through a network loop to conserve specific knowledge and thus, operates as a memory-cell. In contrast to ANN networks which are feed-forwarding based, feedback-based neural networks are dynamic, and their conditions or states modify over time before reaching equilibrium and being tuned.

The states can keep in equilibrium till new inputs come, at which purpose the equilibrium will alter. A vanilla RNN' basic defect is that it's incapable of maintaining and then forgetting long inputs. Long short-term memory (LSTM) is enforced as an RNN extension to recall extended input data, so the link among the lengthy input data and output is characterized in terms of a further dimension or aspect (e.g., time or abstraction location). To recall an extended series of data, an LSTM network employs many gates, as well as an input gate, a forget gate, and an output gate.

The BiLSTM [20] are a version of conventional LSTMs in which the model of input to output and output to input are trained. A BiLSTM model primarily feeds input information to one LSTM model which is the feedback layer, then repeats training through alternate LSTM models in the opposite direction of the input data series (i.e., Watson-Crick complement [22]).

GRU outperforms the competition. Traditional LSTMs have been proven to outperform by BiLSTM models [1]. The algorithm or algorithms utilized in the studies described in this paper. Please keep in mind that the three algorithms (LSTM, BiLSTM, and GRU) have been integrated into one, with lines 9–23 alternating between them as shown in Fig. 1. Calculate the MSE (line 26) and the RMSE after making a forecast and comparing its value to the actual value (line 27) in Fig. 1.

The RMSE produced by every algorithm or model for forecasting stock data is reported in Table 2. In the huge majority of cases, the size of the RMSE values is reduced.

Four out of four stocks show that GRU outperforms BiLSTM and LSTM algorithms.

The percentage reductions between the LSTM and BiLSTM models range from -7.38% for IBM to -17.87% for SNOWMAN. The percentage reductions between the LSTM and GRU models range from -8.86% for IBM to -38.00% for

(5)

BHEL. The percentage reductions between the GRU and BiLSTM models range from -1.62% for IBM to -24.70% for BHEL. The average error values for LSTM, BiLSTM and GRU-based models are 4.45, 15.29 and 3.19 respectively, resulting in a reduction as shown in Table 2. In terms of the data, it's clear that LSTM outperforms ARIMA model with a wide margin of 86.30%. BiLSTM and GRU models outperform standard uni-LSTM models with 11.86% and 20.32% respectively as reported in Table 3. And lastly, in comparison of the BiLSTM and GRU model, GRU performs 9.8% better than BiLSTMas shown in Table 3. In the below Fig. 3, 4, 5, & 6 blue represents concrete data, orange signifies training data and green represents the predicted data. As the RMSE is quite low showing accurate predictions, it is overlapping the blue chart.

Fig. 2. Various forms of ANN [8][21]

TABLE II. THE RMSE OF TIME SERIES DATA STUDIED

Stocks RMSE

ARIMA LSTM BiLSTM GRU

IBM 6.47 2.03 1.88 1.85

SNOWMAN 61.82 2.07 1.70 1.66

HDFC 52.98 3.28 3.13 2.80

BHEL 75.32 10.42 8.58 6.46

Average 49.14 4.45 15.29 3.19

TABLE III. THE %REDUCTION OF ALGORITHMS

Stocks % Reduction

LSTM over ARIMA

BiLSTM over LSTM

GRU over LSTM

GRU over BiLSTM

IBM -68.62 -7.38 -8.86 -1.62

SNOWMAN -96.65 -17.87 -19.80 -2.35

HDFC -93.80 -4.57 -14.63 -10.54

BHEL -86.16 -17.65 -38.00 -24.70

Average -86.30 -11.86 -20.32 -9.80

Fig. 3. IBM (ARIMA): train, test and predicted data

Fig. 4. IBM (LSTM): train, test and predicted data

Fig. 5. IBM (BiLSTM): train, test and predicted data

Fig. 6. IBM (GRU): train, test and predicted data

(6)

VI. DISCUSSION

The results demonstrate that BiLSTM algorithms surpass one-way LSTMs. By traversing input data two-way, BiLSTMs are better ready to comprehend the underlying context (from left to right then from right to left). The superior execution and performance of BiLSTM over traditional unidirectional LSTM is suited to bound varieties of data, comparable to text parsing and prediction of next words in an input phrase. Yet, as there exists contexts, it had been not obvious if the twice training of statistical data and learning each from the past and therefore the future would facilitate better prediction of time series. Aside from all this, GRU appears to be a way superior model since it needs less time because of one fewer gate and performs well once the information isn't too lengthy. For prediction of monetary statistical data, GRU beats standard LSTMs and BiLSTMs, in keeping with our findings. There are plenty of fascinating challenges which can be asked for and handled by trial and error so as to higher perceive the distinctions between the four algorithms and thus discover however differing types of recurrent neural networks work and function.

VII.CONCLUSION

The results of this study were examined and analysed for the performance and accuracy of ARIMA, LSTM unidirectional, LSTM bi-directional, and recurring gated unit (GRU) models together with behavioural training. The main goal of the experiment was to see if training data in the opposite track (i.e., right to left), in addition to traditional data training (i.e., left to right), had any affirmative and significant bearing on enhancing the exactness or the accuracy of time series forecasting, and also to see how model can perform with one gate less than LSTM-based models. The results showed that using an additional layer of training would improve forecast accuracy by 11.86% on average, which is useful for modelling. And with one gate less, it will improve accuracy by 20.32% over traversing data twice. When analyzing the behavior of unidirectional LSTM and BiLSTM models, we noticed an unexpected occurrence. We discovered that training with BiLSTM is slower. In addition to this, GRU was quicker than both LSTM and BiLSTM algorithms. This study implies that BiLSTM may be able to acquire extra data characteristics that unidirectional LSTM models are incapable to disclose since learning goes unidirectional (i.e., from left to right).

As a consequence, for predicting issues in time series analysis, this study recommends using GRU over BiLSTM and selecting BiLSTM over LSTM. This study may be expanded to address multivariate as well as seasonal time series forecasting problems.

REFERENCES

[1] P. Baldi, S. Brunak, P. Frasconi, G. Soda and G. Pollastri., "Exploiting the past and the future in protein secondary structure prediction,"

Bioinformatics, vol. 15, no. 11, pp. 937-946, 1999.

[2] Adebiyi, A. O. Adewumi and C. K. Ayo, "Stock Price Prediction Using the ARIMA Model," in UKSim-AMSS 16th International Conference on Computer Modeling and Simulation, 2014.

[3] M. Alonso and C. Garcia-Martos, "Time Series Analysis-Forecasting with ARIMA models," Universidad Carlos III de Madrid, Universidad Politecnica de Madrid, 2012.

[4] G. E. P. Box and G. Jenkins, Time Series Analysis, Forecasting and Control, USA: Holden-Day, Inc., 1990.

[5] J. Brownlee, "How to Create an ARIMA Model for Time Series Forecasting in Python," Time Series, 7 January 2017.

[6] J. Brownlee, "Time Series Prediction with LSTM Recurrent Neural Networks in Python with Keras," Deep Learning for Time Series, 21 July 2016.

[7] F. A. Gers, J. Schmidhuber and F. Cummins, "Learning to forget:

Continual prediction with LSTM," Neural computation, vol. 12, no. 10, pp. 2451-2471, 2000.

[8] S. Siami-Namini, N. Tavakoli and A. S. Namin., "A Comparison of ARIMA and LSTM in Forecasting Time Series," in 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), 2018.

[9] R. J. Hyndman and G. Athanasopoulos, Forecasting: principles and practice, OTexts, 2018.

[10] R. J. Hyndman, "Variations on rolling forecasts," July 2014. [Online].

Available: https://robjhyndman.com/hyndsight/rolling-forecasts/

[11] Earnest, M. I. Chen, D. Ng and L. Y. Sin., "Using autoregressive integrated moving average (ARIMA) models to predict and monitor the number of beds occupied during a SARS outbreak in a tertiary hospital in Singapore.," BMC Health Services Research, vol. 5, no. 1, pp. 1-8, 2005.

[12] Krauss, X. A. Do and N. Huck, "Deep neural networks, gradient- boosted trees, random forests: Statistical arbitrage on the S&P 500,"

European Journal of Operational Research, vol. 2, pp. 689-702, 2017.

[13] S. I. Lee and S. J. Yoo, "A deep efficient frontier method for optimal investments," arXiv preprint arXiv:1709.09822, 2017.

[14] T. Fischer and C. Krauss, "Deep learning with long short-term memory networks for financial market predictions," European Journal of Operational Research, vol. 270, no. 2, pp. 654-669, 2018.

[15] J. Kim and N. Moon, "BiLSTM model based on multivariate time series data in multiple field for forecasting trading area," Journal of Ambient Intelligence and Humanized Computing, pp. 1-10, 2019.

[16] Z. Cui, R. Ke, Z. Pu and Y. Wang, "Deep bidirectional and unidirectional LSTM recurrent neural network for network-wide traffic speed prediction," arXiv preprint arXiv:1801.02143 , 2018.

[17] D. Vasily, A. Matviychuk, N. Datsenko, V. Bezkorovainyi and A.

Azaryan, "Machine learning approaches for financial time series forecasting," in CEUR Workshop Proceedings, 2020.

[18] O. B. Sezer, M. U. Gudelek and A. M. Ozbayoglu, "Financial time series forecasting with deep learning: A systematic literature review:

2005–2019," Applied Soft Computing, vol. 90, p. 106181, 2020.

[19] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.

[20] M. Schuster and K. K. Paliwal., "Bidirectional recurrent neural networks," IEEE transactions on Signal Processing, vol. 45, no. 11, pp.

2673-2681, 1997.

[21] S. Siami-Namini, N. Tavakoli and A. S. Namin, "A comparative analysis of forecasting financial time series using arima, lstm, and bilstm," arXiv preprint arXiv:1911.09512, 2019.

[22] J. Gao, H. Liu and E. T. Kool., "Expanded-Size Bases in Naturally Sized DNA: Evaluation of Steric Effects in Watson− Crick Pairing,"

Journal of the American Chemical Society, vol. 126, no. 38, pp. 11826- 11831, 2004.