Data Preparation and LSTM model implementation process

Chapter 7: Results of the LSTM forecasting models

7.5 Data Preparation and LSTM model implementation process

DL models can be categorized into supervised, semi-supervised, and unsupervised [194]. Since NZX 50 Index and S&P 500 Index data are fully labelled, we adopt a supervised learning approach in implementing the LSTM model. This adoption is consistent among researchers as well as ML practitioners ([194], [241], [314]–[316]).

Once the data set is loaded to Python, data preparation is performed before executing the LSTM model. Time sequence data preparation for the LSTM model is framed as a supervised learning problem, and the input variables must be set to float, normalised, and scaled. Functions namely astype(), MinMaxScaler() and fit_transform() are used for initial data preparation

which is essential prior to the LSTM model is used. [241], [314]–[316] envisions this process improves the training and validation phases. Time series is a sequence of numbers ordered by a time index using either a list or a column. However, supervised learning is a procedure that consists of an input pattern that supports predicting the output pattern and applying this mapping to predict the outputs for the unseen data. Since time series prediction is centred around historical observations, those observations need to appear beside the original series.

An essential function to help transform time series data to a supervised learning problem is the Shift() function in the Python Panda’s library supports, which creates new framings of time series problems given the desired length of input and output sequence. The Shift() function enables the generation of copies of columns either pushed forward, creating rows of NaN (not a number) values added to the front, or pulled back, creating rows of NaN values added to the end.

Series_to_supervised() function, which identifies and specifies the number of lag observations to use in each sample's input (output), is used next. This direction is suggested and implemented by ML practitioners such as [314], which removes the rows with NaN values that cannot be used to train or test the model. The function series_to_supervised() takes four arguments as data, n_in, n_out, dropnan, where data: Sequence of observations as a list or 2D NumPy array; n_in: number of lag observations as input (X); n_out: number of observations as output (y) and dropnan: Boolean whether or not to drop rows with NaN (not a number) values.

The Boolean values are set to true to drop if any NaN values are available in the dataset. The function returns Pandas DataFrame of series framed for supervised learning. The use of previous time steps to predict the next time step, widely known as the sliding window method, is applied in the LSTM architecture. The multivariate dataset contains the NZX 50 Index and the S&P 500 Index. The returned data frame of the series_to_supervised() function includes

two output columns relevant to the input 2D array with two input variables. The model focuses on predicting the NZX 50 Index.; thus, the output column related to the S&P 500 Index value was dropped from the returned data frame using the drop function available in the Python Panda’s library. The drop feature is used only with the multivariate analysis (NZX 50 Index and S&P 500 Index).

The supervised learning data frame is split into two segments, train and test sets, and then the train and test sets are split up into input and output variables. Time series data preparation for the LSTM architecture requires one additional step: the two-dimensional structure of the supervised learning dataset must be transformed into a three-dimensional structure before the LSTM model is implemented. The three-dimensional array of the LSTM input layer must contain samples, timesteps, and features. The reshape() function on NumPy arrays is used to reshape one-dimensional or two-dimensional data to be three-dimensional [241].

Defining the model and the network is critical in developing the LSTM architecture. In determining the LSTM model, we use the model class Application Programming Interface (API) as a functional API that provides some inputs and outputs. As the neural networks are defined in Keras as a sequence of layers, we use the function Sequential() to create a sequence classification so that the layers can be generated or included in the order of the connected layers.

The function Model.add() support to add the LSTM recurrent layer is called LSTM() and Dense() layer. This way, a fully connected layer that often follows the LSTM layers is used to create a prediction known to be Dense(). Once the right network architecture is defined, the next critical step is the model compilation which is the efficiency step.

The function Model.compile() is used as this function requires the parameters to be specified. It specifically aims at the optimisation algorithm used to train the network and the

loss function used to evaluate the network that is minimised by the optimisation algorithm.

Once the predictions are made for each step in the test dataset, they are compared with the test set to formulate the residuals. The LSTM model minimises the MAE loss function with the Adaptive Moment Estimation (Adam) implementation of gradient descent. Adam is an iterative learning rate optimisation algorithm to determine the minimum of the loss function, and it is empirically tested and works well in practice.

[317] proposed Adam as a method for efficient stochastic optimisation that needs only first-order gradients with little memory requirement. They highlighted the attractive advantages of the technique, such as its computational efficiency even for non-stationary settings, the need for little memory, its simplicity, its invariance to rescaling of the gradient, and its suitability for noisy data. Researchers such as [317]–[320] embrace Adam as an efficient implementation of the gradient descent adaptive learning rate optimisation algorithm explicitly designed to train the ML and DL models. ML practitioners have adopted Adam as superior to other stochastic optimisation methods [241], [314]–[316]). Thus, Adam is employed in the devised LSTM in conjunction with the MAE loss function. In addition, MAPE and RMSE loss functions are also used for model comparison.

Once the compilation of the LSTM architecture is completed, it is pertinent to fit the devised model. This fitting is conducted through the Model.fit() function, which measures how well the designed LSTM model generalises to similar characteristics of the time series data to that on which the model was trained. The model fitting requires a training configuration: the number of epochs and the batch size. An epoch refers to the number that goes through all the samples in the entire training dataset and updates the network weights accordingly, and one epoch consists of one or more batches. Alternatively, a batch means a move across a subset of samples in the training dataset, after which the network weights are updated.

A loop is implemented with the RMSE as the underlying determinant to establish the ideal epoch and the batch size at which the formulated LSTM models will be optimised. In this iterative process, the values in the entire training data range (more precisely, 1,521 for sample period one, 1,870 for sample period two, and 2,381 for sample period three) are assigned to epochs and batch sizes to ascertain the ideal epoch and batch at which the RMSE is minimised.

Further, this loop is run a few times to establish the suitability of the derived optimal epoch and batch sizes. Once the ideal epoch and batch size are established, each model is fitted at the optimised values specific to the univariate or multivariate model and the sample period in this study. Once again, the model fitting is done at least a few times to get more refined results. This repetitive strategy of assigning varying values to epoch and batch is used to tune the updated models and capture the global optima. Such a strategy is employed by ML practitioners [241].

Once the model fitting is finalised, and we are satisfied with the performance of the fitted model, each model is subsequently used for prediction purposes. The proper model determination and performance are established via Adam in conjunction with the MAE loss function. In addition, MAPE and RMSE loss functions are also used for model performance comparisons.

The function of Model.Predict(), which generates output prediction for input samples, is used for forecasting. Pyplot in Python is used to create the learning curves to determine the learning performances of the devised LSTM models. These graphs are efficient tools to measure the training history; the model performance over time is evaluated, enabling the user to establish that the devised models are not over-or under-fitted.

Figure 7.1 shows a flow chart of the summarised estimation procedure to develop the univariate and multivariate LSTM model. This process is employed by ML practitioners [241].

Figure 7.1: Steps of LSTM Model Development

Dalam dokumen Critical comparison of statistical and deep learning models applied to the New Zealand Stock Market Index (Halaman 144-150)