126
attribute. Finally, the independent test dataset which is 20% from the train_test split is used to estimate the performance of the best-selected model.
We tried to improve XGBoost performance by finding the optimum value for its chosen hyper-parameter. The algorithm says the best values are 3 and 1 for max_depth and min_child_weight respectively. In this research, tuning the algorithm for XGBoost has shown an improvement in the test dataset to accuracy to 93%.
The complete codes are available in the Appendix.
127
Fig 28: An increasing trend for spending and ordering. The increasing trend shows that there are behavioral changes over time and the graph is not stationary. As part of data transformation, the data need to be converted to stationary before converting to supervised learning with feature sets for the LSTM model.
It clearly shows an increasing trend. Then the data is divided into training and test data. The experimental test setup data will do a model on training data and predict for test data. A good baseline forecast for a time series with a linear increasing trend is a persistence forecast where the prior time is used to predict the current time. A rolling forecast scenario is made by shifting the training spend data once. An error score based on Root Mean Square Error (RMSE) is calculated based on the model developed to summarise the accuracy. In this case, the error is more than 9221.876 over the test dataset. Finally, a plot is made to show the training dataset and the diverging predictions from the expected values from the test dataset.
From the persistence model predictions plot shown in Figure 29, it is clear that the model is one-step behind reality. There are a rising trend and month-to-month noise in the spend figures, which highlights the limitations of the persistence technique.
128
Fig. 29: Baseline graph for spend and quantity forecast. Notice that the predicted model (orange curve) is 1 step behind the actual values (green curve). This shows that the persistence technique has limitations in prediction as it did not avoid the noises in the values.
Step 2: Data Preparation for LSTM
The Long Short-Term Memory recurrent neural network has the potential of educating an extensive series of the dataset. The data preparation is done as follows. First, the data is made as a supervised learning model for ML, then make sure it is stationary, and finally, the dataset should be on a particular scale.
Supervised ML needs input and output variables and use a model to acquire the mapping function from the input to the output. The objective is to approximate the real underlying mapping even when the new dataset is used, it can do predictions. For time-series observations this is implemented by using prior time steps as input variables and the current or next time step as the output variable. This method is called the sliding window method or lag method. In this case, the multistep sliding window forecasting is done. By applying lags 12 times there exist 12 lag long input sequences to predict the output.
To transform the structure of the data into stationery is by differencing. To find the difference (diff variable) just subtract the prior data with current data to remove the trend
129
and to show the differences in the dataset. Time series is stationary if they do not have a trend or seasonal effects. Thestationary time-series graph is plotted as shown in Figure 30.
Fig. 30: Stationary time-series graph. Notice that there is no increasing or decreasing trend showing there are no behavioural changes over time. A stationary graph is a flat looking series, without trend, constant variance over time, and no periodic fluctuations.
Before using it for modelling it is important to check if it is useful for prediction for that, in this case, adjusted R2 is found. Adjusted R2 if greater than 0.5 is moderately good and more than 0.7 can be considered very good bonding.
The adjusted R2 value which explains how much variation of diff variable is explained is notable as the score is 79%, which is better than the persistence model that achieved an RMSE of 9221.876 over the test dataset. Feature variables are ready to build a model after scaling data. Before scaling, the data should be split into train and test sets. As the test set, the last six months’ Spend is selected.
LSTMs presume data to be within the rule of the activation function used by the network.
The preferred range for a time series data is -1 to 1. In this case, MinMaxScaler is used for this transformation.
130
The Long Short-Term Memory network (LSTM) is a type of Recurrent Neural Network (RNN). To compile the RNN, loss function and optimisation algorithm must be given. MSE is used as a loss function as it is similar to RMSE. The code block used [given in Appendix I] also prints how the model improves itself and reduces the error in each epoch. Now the model is ready for prediction.
Step 3: Spend Forecasting
Results of prediction look similar, but it does not tell much because these are scaled data that shows the difference. To see the actual, spend prediction first, the inverse transformation for scaling is done. Second, the data frame is built to show the dates and predictions.
Transformed predictions are showing the difference. Calculated predicted spend should also be shown in the same data frame as given in Figure 31. Table 8 shows the real value comparison of actual versus predicted for spend and quantity forecasting.
Table 8: Actual vs Predicted values for spend and quantity
Date Actual Spend
Predicted Spend
Actual Quantity
Predicted
Quantity
01/07/2018 22626740 18705284 317 306
01/08/2018 22781770 23740806 413 368
01/09/2018 21789480 25379430 339 427
01/10/2018 27425110 22904167 425 408
01/11/2018 27493800 32998242 454 450
01/12/2018 33823440 33063405 444 415
131
Fig. 31: Spend forecast for the last 6 months. From the plot, it can be observed that the actual spending went up while our model also predicted that the spend will go up. This clearly shows how powerful LSTMs are for analysing time series and sequential data.
This could be considered a good prediction as an increase is shown before it happened, which makes the top management ready to manage cash flow easier.
Step 4: Quantity Forecasting
A similar study is done using quantity ordered for a particular product from The X Hotels.
The initial steps in data transformation included converting the data to stationary, convert time series to a supervised model for having a feature set for the LSTM model, and scaling the data. As in the earlier study of spend forecasting, lag 1 to lag 12 is assigned values by using shift command. Calculated predicted quantity is shown in the same data frame with the actual quantity ordered as given in Figure 32.
132
Fig. 32: Quantity forecasted for the last 6 months. From the plot, it can be observed that the actual quantity bought went up while the model also predicted that the quantity bought will go up. This clearly shows how powerful LSTMs are for analysing any variable in time series and sequential data.