Analysis of Multi-Layer Perceptron and Long Short-Term Memory on Predicting Cocoa Futures Price

(1)

Analysis of Multi-Layer Perceptron and Long Short-Term Memory on Predicting Cocoa Futures Price

Abbsumarmanali Firyabi Sakhtiyani, Siti Saadah, Gia Wulandari School of Computing, Informatics, Telkom University, Bandung, Indonesia

Email: abbsumarfs@student.telkomuniversity.ac.id, sitisaadah@telkomuniversity.ac.id,giaseptiana@telkomuniversity.ac.id Correspondence Author Email: abbsumarfs@gmail.com

Abstract−Predicting the price of Cocoa Futures is needed by farmers and also the government in determining policies. The uncertainty of price movements can affect farmers’ income and also foreign exchange savings because Indonesia is the largest cocoa-producing country in the world. In this study, we use the cocoa futures dataset to train using Multi-Layer Perceptron (MLP) and Long Short-Term Memory (LSTM) to make a prediction of the cocoa futures price. In that way, this study resolves the uncertainties using the MLP method and also the LSTM, where these two methods produce a model using the input of data train and data test to predict the price of cocoa futures contracts and then be compared to see which one is the right one for the cocoa dataset. The dataset used is quoted from the Investing.com page taken from 2003 to 2021. The result of this study is the best model between MLP and LSTM model, where the LSTM can produce the best model using 50-50 Train to test data ratio, 128 batch size, and 64 Neurons on the hidden layer with evaluation metrics value in RMSE is 2.27, MAE is 32.11, and MAPE is 1.29 or 98.71% accuracy. This is because the LSTM model has logic gates in the layers that have an advantage on time series data using memory, where the LSTM model could memorize the output and use the output again as an input to achieve the best output.

Keywords: Prediction; Cocoa Futures Price; Comparison; Multi-Layer Perceptron; Long Short-Term Memory

1. INTRODUCTION

Cocoa is one commodity that has an important role in the national economy. It is also one of the commodities traded on the futures exchange. A cocoa futures contract is a form of trading commodity futures contracts on the Futures Exchange in which the price movements will affect the price of physical commodities. This contract is a means for risk management through hedging and price discovery [1]. The price development of cocoa can also affect the national economy such as providing employment as well as state income and foreign exchange [2].

Cocoa price fluctuations will affect the income of cocoa farmers due to factors such as the selling price of cocoa.

Uncertainty about price fluctuations can then become a problem for the income of cocoa farmers and will also have an impact on state income and foreign exchange [3]. Indonesia as the sixth-largest producer of cocoa beans with 5.1% of global productions [4], requires a forecast of the price of cocoa futures to resolves the problem of uncertainty.

Study about prediction of Indonesian cocoa futures price itself is very limited due to the limitation of the data. However, recent study stated that the price movement is highly related to global price movement so that we can use global data to see Indonesian cocoa futures price movement [5]. The prediction of the price of cocoa in general has previously been carried out using the ARIMA method in the previous study [6], with an accuracy of 98.5% with a Mean Absolute Percentage Error (MAPE) of 1.5 on the daily dataset. Another study on Agricultural commodities, including Cocoa, used Long Short-Term Time-Series Network (LSTNet) model, which can outperform other models such as RNN, CNN, ARIMA, and VAR [7]. On the other hand, the use of Multi-Layer Perceptron (MLP) and Long Short-Term Memory (LSTM) is known can give high accuracy for prediction. For example, predictions using MLP on the IR64 Quality III rice price dataset resulted in an accuracy of 99.9% with a MAPE of 0.044 in the study [8] , and predictions using LSTM on the rice futures price got an accuracy of 98.8%

with a MAPE of 0.12 in the study [9]. However, to the best of our knowledge, these methods have never been used to predict cocoa futures price before. Therefore, here we apply MLP and LSTM to predict cocoa futures prices to achieve high accuracy based on the previous study that stated deep learning method could give an accuracy close to 100% on agricultural commodity price.

From the explanation that has been written previously, this research uses the MLP and LSTM methods to overcome the uncertainty of the price movements of cocoa futures contracts by building a system to predict price movements. The data used is time-series type data with MLP and LSTM as prediction system methods. In this study, we perform a comparison between MLP and LSTM to determine the level of accuracy in predicting price movements of cocoa futures price.

2. RESEARCH METHODOLOGY

2.1 Research Stages

Figure 1. Prediction using MLP and LSTM shows the research stages on prediction using MLP and LSTM in the form of a flowchart of the system we made to predict cocoa futures using MLP and LSTM models. From the cocoa futures price dataset, we need to do some preprocessing to remove any outliers that could reduce the accuracy of the prediction models. Then, we divide the data for each model into data train and data test. After that, we train

(2)

DOI: 10.30865/mib.v6i4.4498

Abbsumarmanali Firyabi Sakhtiyani, Copyright © 2022, MIB, Page 1883 the model using data train and data test to produce a prediction result. The prediction result is then evaluated to choose the best model.

Figure 1. Prediction using MLP and LSTM

2.1.1 Data Collection

In this study, the data was collected from public data from https://www.investing.com/commodities/us-cocoa. The data consists of Cocoa futures daily prices from January 2003 to December 2021. After the data is collected, then comes the preprocessing process to make the data able to use for prediction using MLP and LSTM. In this study, the data used is data from the daily closing price. The cocoa futures dataset can be seen in Table 1.

Table 1. Cocoa Futures Dataset

Date Open High Low Close Volume

2003-01-02 2035.0 2135.0 2033.0 2099.0 8498

2003-01-03 2130.0 2150.0 2120.0 2145.0 4489

2003-01-06 2130.0 2200.0 2130.0 2191.0 18452

2003-01-07 2186.0 2198.0 2175.0 2189.0 6007

2003-01-08 2160.0 2172.0 2078.0 2089.0 8932

The data shown in Table 1 is Cocoa Futures price in time series data, in which a set of data is collected over time within a certain period [10]. Time-series data can be used in the economic field to predict the future to help in decision making. To make a prediction using time series, the model will learn the pattern from the past to project it into the future using Mathematics and Statistics [11]. Figure 2 shows the visualization of the Cocoa futures contract price on the daily time frame where the number of days is represented by the x-axis, and y-axis for the prices. Figure 2 shows that the Cocoa price dataset tends to fluctuate and has been to different trends, which are expected to be predicted using MLP and LSTM.

Figure 2. Time-series data of cocoa futures price 2.1.2 Data Preprocessing

In time-series data preprocessing is crucial because it includes a lot of information. A common problem in time- series dataset are missing values, outliers, and noise in the data. Data preprocessing comes in handy to fix these problems to achieve a good quality dataset to be able to predict using our MLP and LSTM model [12]. In the dataset we use, there are some values that are not used in the system, that are Open, High, Low, and Volume.

Hence, in this stage, we remove unnecessary values and only keep Date and Close column, as shown in Table 2.

Here, we also divide the original dataset into two parts that are data test and data train. The distribution for these two data is vary to explore the best distribution based on prediction result.

Table 2. Preprocessed cocoa futures dataset

Date Close

2003-01-02 2099.0 2003-01-03 2145.0 2003-01-06 2191.0 2003-01-07 2189.0

(3)

Date Close

2003-01-08 2089.0

… …

2.1.3 Training and Testing

The classification and formation of the model using MLP and LSTM are defined in this training process. The data that has been preprocessed in the previous stage is used as input for training the model. The data was used as mentioned before by combining different ratios of train data and test data.

There are three main steps in training the MLP model. The model will first input the weight and bias on each model for the first time, and then the model will calculate the model's output. The output layer produces predicted output as well as actual or expected output. Using these two outputs, we calculate the loss that must be backpropagated using the MLP backpropagation algorithm. During the backpropagation process, the backpropagation algorithm would update the model's weight and bias and feed it back into each layer. The visualization of the training process can be seen in Figure 3.

Figure 3. Training MLP

The three gates that make up the LSTM training process are the forget gate, input gate, and output gate. In the forget gate, LSTM accepts three inputs: a Cell state in the network's long-term memory, followed by a hidden state in which the previous output and input data from the dataset. Input data and hidden states are fed into forget gates, which use sigmoid activation to produce a 0 when the input is irrelevant and a 1 when the input is relevant.

The irrelevant data will then be forgeted. The sigmoid result will then be multiplied with the cell state.

Figure 4. Training LSTM

The next step is to store information from the new input in the cell state using the previous hidden state and current input data. A Sigmoid layer decides which new information to update and which to ignore. Tanh layers generate a vector containing all of the new input's possible values. The latest cell state is calculated by multiplying these two values. To obtain the new cell state, the information from the input gates will be combined with the previous cell state.

The new hidden state will then be produced by the output gates. To generate a new hidden state, we take the previous hidden state as an output and the current input data as input, filtering with sigmoid activation, and repeating the process. However, we also include a new cell state in which we filter the output gates using tanh activation. To generate the new hidden state, the tanh and sigmoid will be multiplied. The training of LSTM can be seen in Figure 4.

In the process of the training, to make a prediction using MLP and LSTM, we use the Keras library [13]

with an architecture of hyperparameters is used as shown in Table 3, which has an input layer of 64 neurons, a hidden layer with 16, 32, and 64 neurons that were tested respectively and finally an output layer consisting of one neuron. We also use 500 epoch and Batch size consists of 64 and 128.

(4)

DOI: 10.30865/mib.v6i4.4498

Model Parameter Value

Input Layer 64

Hidden Layer 16, 32, 64

Output Layer 1

Epoch 500

Batch Size 64, 128

In the testing stage, we use data test that has been preprocessed in the preprocessing stage. After the testing is done and the prediction has been done, the validation process would validate the results between the predicted value obtained from the data training process to actual value. As in Figure 5, the Testing model process uses two inputs from the data test and data train. After all the testing process is done, the results are a prediction result where it shows the actual data and the predicted data.

Figure 5. Flowchart Testing Model 2.1.4 Evaluation

In this stage, we verify whether the models obtained in this research are applicable. There are several types of evaluations we can use for the verification that are Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and lastly Mean Absolute Percentage Error (MAPE). Formulas (1-3) defined these evaluations in a mathematical equation

𝑅𝑀𝑆𝐸 = √^∑^𝑛^𝑡=1^(x^𝑡^−𝑥̂^𝑡⁾²

𝑛 (1)

𝑀𝐴𝐸 = ^∑^𝑛^𝑡=1^|x^𝑡^−𝑥̂^𝑡^|

𝑛 (2)

𝑀𝐴𝑃𝐸 = ∑ |^x^𝑡^−𝑥̂^𝑡

x_𝑡 |

𝑛𝑡=1 𝑥 100% (3)

In the above formulas, n refers to the number of days the forecasting is carried out. Then 𝑥_𝑡 represents the actual value of cocoa future price on 𝑡^𝑡ℎday, while 𝑥̂_𝑡 represents the predicted value. RMSE, MAE, and MAPE perform calculations of error based on the difference between the two values. Hence, the closer the calculation to 0, the better the prediction result is. The difference between RMSE and MAE in which RMSE calculate the error by taking the root value of squared mean error [14]. While, MAE use the absolute on the mean value of error [15].

MAPE then change the MAE in the form of percentage that can be used as the accuracy of the models [16].

2.2 Multi-Layer Perceptron (MLP)

The Multi-Layer Perceptron (MLP) is an artificial neural network composed of several interconnected neurons.

MLP has many layers, including an input layer, a hidden layer, and an output layer [17]. As shown in Figure 6, he first layer is the input layer, which is connected to the second layer, the hidden layer, via neurons represented by weights, and then to the final layer, the output layer via neurons [18].

Figure 6. Basic Architecture MLP [19]

Each MLP layer contains neurons, and the neuron's input for the next layer is the output of the previous layer multiplied by weight and bias. MLP output can then be described as in Formula (4). The activation functions in the output and hidden layers are represented by ∅^𝑜and ∅^ℎ, respectively. Then, 𝑖, 𝑗, and 𝑝 defined the input of input layer, hidden layer, and output layer. After that, we have 𝐼 and 𝐽, which represent the number of neurons in the input and hidden layers, respectively. The weight of the neurons from the 𝑗^𝑡ℎ in hidden layer to the 𝑝^𝑡ℎ output layer is defined as 𝑊_𝑝𝑗. 𝑊_𝑗𝑖 is also the weight of neurons from 𝑗^𝑡ℎ in hidden layer to 𝑖^𝑡ℎ in input layer. 𝑋_𝑖 is input matrix. 𝜕_𝑗^ℎ is bias of 𝑗^𝑡ℎ neurons in hidden layer and 𝜕_𝑗^ℎ is bias of 𝑗^𝑡ℎ in output layer [20].

(5)

2.3 Long Short-Term Memory (LSTM)

While LSTM is another type of artificial neural network a variant from Recurrent Neural Network (RNN) that works better than RNN because LSTM has layered in every cell [21]. The architecture of LSTM is shown in Figure 7. Same as MLP, LSTM also has a layer in which there is the input layer, a hidden layer, and an output layer [22].

Figure 7. The basic architecture of LSTM [23]

As in LSTM computing steps, the input value is only stored in cells if only the input gate gives access.

Input value from 𝑖_𝑡 and candidate value from memory cells 𝐶̂_𝑡 can be written in the following mathematical equation

𝑖_𝑡= 𝜎(𝑊_𝑖𝑥_𝑡+ 𝑈_𝑖ℎ_𝑡−1+ 𝑏_𝑖), (5)

𝐶̂_𝑡 = 𝑡𝑎𝑛ℎ(𝑊_𝐶𝑥_𝑡+ 𝑈_𝐶ℎ_𝑡−1+ 𝑏_𝑐), (6)

𝑓_𝑡= 𝜎(𝑊_𝑓𝑥_𝑡+ 𝑈_𝑓ℎ_𝑡−1+ 𝑏_𝑓), (7)

𝐶_𝑡= 𝑖_𝑡× 𝐶̂_𝑡+ 𝑓_𝑡× 𝐶_𝑡−1 (8)

𝑜_𝑡= 𝜎(𝑊_𝑜𝑥_𝑡+ 𝑈_𝑜ℎ_𝑡−1+ 𝑉_𝑜𝐶_𝑡+ 𝑏_𝑜), (9)

ℎ_𝑡= 𝑜_𝑡× 𝑡𝑎𝑛ℎ(𝐶_𝑡). (10)

Where 𝑊,𝑈, 𝑏 represents the weight of matrix and bias respectively. Weight from unit organized by forgets gate and value from forget gate can be written in formula (7). In forget gate, a new state from the memory cell will be updated using the formula (8). With the new state from the memory cell, the output value from the gate calculates in formula (9). After that, the final output will be defined using the formula (10) [24].

3. RESULT AND DISCUSSION

This study refers to the main objective of measuring the daily price of the cocoa futures price. Cocoa futures prices will be predicted every day using the dataset that has been collected from 2003 to 2021 and already preprocessed.

Both MLP and LSTM have trained and tested with the results of prediction. Using the prediction results we were able to see which model produced the highest accuracy on the cocoa futures dataset.

3.1 MLP Model Scenario

From the testing stage using the daily dataset of cocoa future prices, the performance of the prediction model using MLP to make a predicted price shows a great result. The hyperparameter that is used for the daily is using 64 neurons in the input layer, for the hidden layer we tested the different number of neurons from 16, 32, to 64, the output layer consists of 1 neuron, using 500 epochs, and we use different batch size consist of 64 and 128. The optimizer used is the Adam optimizer.

Table 4. MLP Prediction Result

MLP Model

Number of Neurons

Neuron 16 Neuron 32 Neuron 64 RMS

E MA

E

MAP E

RMS E

MA E

MAP E

RMS

E MAE MAP

E MLP 50-50 Batch 64 20.61 37.37 1.47 19.44 36.69 1.45 4.32 33.10 1.33 MLP 60-40 Batch 64 32.13 44.63 1.78 12.54 36.71 1.45 8.85 35.01 1.39 MLP 70-30 Batch 64 5.51 36.23 1.52 5.40 35.86 1.50 4.08 35.79 1.50

(6)

DOI: 10.30865/mib.v6i4.4498

Number of Neurons

Neuron 16 Neuron 32 Neuron 64 RMS

E MA

E

MAP E

RMS E

MA E

MAP E

RMS

E MAE MAP

E MLP 80-20 Batch 64 29.55 45.41 1.82 11.07 39.12 1.56 3.07 36.67 1.47 MLP 90-10 Batch 64 29.37 44.99 1.79 15.30 39.23 1.56 6.02 37.60 1.51 MLP 50-50 Batch 128 32.46 43.60 1.72 22.27 38.38 1.50 1.85 32.75 1.31 MLP 60-40 Batch 128 34.36 46.03 1.79 20.34 38.67 1.54 13.54 36.15 1.43 MLP 70-30 Batch 128 12.53 37.16 1.55 6.74 36.24 1.52 9.18 36.37 1.52 MLP 80-20 Batch 128 27.47 44.54 1.77 16.85 40.20 1.62 4.91 36.99 1.48 MLP 90-10 Batch 128 7.84 38.05 1.52 7.66 37.59 1.50 7.01 36.87 1.48

Based on Table 4, the performance of each model is good enough hovering above 98% accuracy based on the MAPE. The best model that we could get is the MLP model with 50-50 Batch 128 meaning that the MLP model used 50% of the dataset to train and 50% dataset to test, also using the Batch size of 128. This model produces a great result using the hidden layer with the number of neurons 64. When we look at Table 4, we can see that each number of neurons on the hidden layer produces a different result, and the result that we obtained gives a pattern that 16 Neurons give the lowest accuracy meanwhile 32 neurons give a mid-accuracy and 64 neurons are the highest accuracy.

Based on this explanation, it is clear that the results of the RMSE, MAE, and MAPE matrices evaluation can be used as a reference to determine whether the predictions are good or not. We can visualize the worst and best models as in Figure 8, where we can see that the best model MLP 50-50 Batch 128 produces RMSE 1.85, MAE 32.75, and MAPE 1.31, indicating that the model's accuracy is 98.69 percent and there is a strong correlation between the prediction line and the actual value, whereas MLP 80-20 Batch 64 in Figure 8, the prediction line and the actual value have a gap. Even though both are MLP models, different parameters could affect the accuracy of the model.

(a) (b)

Figure 8. Best and Worst Model of MLP 3.2 LSTM Model Scenario

After we have the result on MLP, the LSTM also getting trained and tested to predict the price of cocoa futures.

LSTM has an advantage over MLP because this model can memorize the output and has a probability to make it as the next input whereas the MLP cannot memorize it. Tuning LSTM with the batch size in this model is very important because the memory feature of LSTM makes it slow and takes a long time to train. Thus, changing the batch size to 64 or even 128 could boost the training time.

In building the LSTM we use the same hyperparameter as the MLP model, we trained the LSTM model in various ratios of test data and train data, alongside using 64 neurons for the input layer, and for the hidden layer we also use a variation of 16, 32, and 64 tested respectively, as for the output layer consist of 1 neuron. The epoch we used is 500 epochs and batch sizes of 64 and 128 that were tested respectively. The optimizer that we used is the Adam optimizer.

Table 5. LSTM Prediction Result

Model

Number of Neuron

Neuron 16 Neuron 32 Neuron 64 RMSE MAE MAPE RMSE MAE MAPE RMSE MAE MAPE LSTM 50-50 Batch 64 14.01 34.45 1.36 8.16 32.90 1.31 2.27 32.11 1.29 LSTM 60-40 Batch 64 8.15 34.15 1.37 7.28 33.89 1.36 3.10 33.64 1.35 LSTM 70-30 Batch 64 9.59 36.03 1.51 0.14 35.05 1.47 2.17 35.08 1.47

(7)

Number of Neuron

Neuron 16 Neuron 32 Neuron 64 RMSE MAE MAPE RMSE MAE MAPE RMSE MAE MAPE LSTM 80-20 Batch 64 16.92 39.80 1.64 5.33 37.25 1.53 5.79 37.07 1.52 LSTM 90-10 Batch 64 18.22 40.15 1.60 7.44 36.91 1.48 0.89 36.52 1.46 LSTM 50-50 Batch 128 16.71 34.54 1.39 12.66 33.33 1.34 7.29 32.59 1.30 LSTM 60-40 Batch 128 8.39 34.97 1.39 5.92 34.10 1.37 7.88 34.36 1.37 LSTM 70-30 Batch 128 4.58 35.33 1.48 3.39 35.22 1.48 4.36 35.30 1.48 LSTM 80-20 Batch 128 10.18 37.56 1.54 4.39 36.81 1.51 0.00 36.68 1.50 LSTM 90-10 Batch 128 8.61 37.08 1.48 11.75 36.60 1.47 10.09 36.45 1.47

We trained the LSTM and collected the results of each model using the previously mentioned hyperparameter. The result is divided by the number of Batches, the number of neurons used on the hidden layer, and various test/train data ratios. The obtained results can be evaluated using the evaluation matrices that have been defined, such as RMSE, MAE, and MAPE.

Looking at Table 5, we have a great result using LSTM on predicting cocoa futures prices. LSTM gives a good prediction result on the cocoa futures daily timeframe dataset where we can see that every model hovers above the 98,40% accuracy based on the MAPE. As the best model of LSTM, we can see that has been bold is the LSTM 50-50 Batch 64 on Neuron 64, which means the model producing the best result is the LSTM model with the ratio of dataset 50% on train data and 50% on test data. Moreover, the model used Batch 64 and the hidden layer consists of 64 Neurons. The evaluation matrices show that the best model produces RMSE of 2.27, MAE is 32.11, and MAPE of 1.29 meaning that the model accuracy is 98.71%. As it is in MLP, the LSTM results give us a pattern when we predict using different numbers of neurons on the hidden layer. We can see that lower neurons give lower accuracy on the model, while the highest neurons give us the highest accuracy model.

(a) (b)

Figure 9. Best and Worst Model on LSTM

The result in Table 5, shows that the model gives various results but still shows the best model to use and also the worst model to use. The results of these models can be visualization in the form of line chart where we can see the difference between the actual data and prediction data. As shown in Figure 9, we can see in the best model, the prediction line is inside the actual line making it y close prediction from the actual data and a high accuracy model. Meanwhile, the worst model prediction line is not inside the actual line but is still very close because the model still gives high accuracy.

3.3 MLP and LSTM Comparison

Both the MLP and LSTM models were already trained and tested and we already got the result, the next thing to do is looking which model between the two is the best. To do so we taking the best model from both MLP and LSTM. For MLP we take the MLP 50-50 Batch 128 with 64 Neurons, and for LSTM we took the LSTM 50-50 Batch 64 with 64 Neurons. Both models achieve the highest accuracy when compared to another ratio.

Table 6. Comparison of MLP and LSTM

Model RMSE MAE MAPE

MLP 50-50 Batch 128. 64 Neurons 1.85 32.75 1.31 LSTM 50-50 Batch 64, 64 Neurons 2.27 32.11 1.29

As seen in Table 6. Comparison of MLP and LSTM, the MLP model with 50-50 Train and test data ratio, with a batch size of 128, and 64 neurons on hidden layer able to gain RMSE of 1.85, MAE of 32.75, and MAPE of 1.31 meaning that this model achieve the accuracy of 98.69%. Meanwhile, the LSTM with 50-50 Train and test data ratio, with a batch size of 64, and 64 neurons on hidden layer able to gain RMSE of 2.27, MAE of 32.11, and MAPE 1.29 meaning the accuracy of the model is 98.71%. LSTM can achieve the highest accuracy compared to

(8)

DOI: 10.30865/mib.v6i4.4498

Abbsumarmanali Firyabi Sakhtiyani, Copyright © 2022, MIB, Page 1889 MLP because it has a logic gate inside the architecture as shown in Figure 7. These gates can memorize the output and can consider that output to be used again as input, making the LSTM able to perform better than MLP in Time- series data.

3.4 Result of Cocoa Future Price

After we get the best model to predict the fluctuation of the cocoa futures price, we can then see the predicted price and compare it to the actual price. Here, we use the LSTM 50-50 Batch 64 with 64 Neurons as this model is the best model with the highest accuracy. We can see in Table 7 that the model makes a daily prediction, and we can see the actual closing price, predicted closing price, and differences in prices between the two columns.

Table 7. Result of Cocoa Price Prediction

Date Actual Price Prediction Price Differences in Price

2012-07-11 2286.0 2294.2 8.2

2012-07-12 2184.0 22283.1 99.1

2012-07-13 2210.0 2174.7 35.2

2012-07-16 2197.0 2199.5 2.5

2012-07-17 2195.0 2184.4 10.5

The table above shows that the cocoa futures dataset fluctuates daily based on the closing price. The differences between the actual price of cocoa futures and the prediction price using the LSTM model are not very far. Even though the prediction accuracy is not 100%, it is still very close to 100%. The dataset used for the prediction shows an excellent quality when the model is trained in various trends and produces a high accuracy prediction result that could help make an investment or business decisions.

4. CONCLUSION

Cocoa, as one of the world's most important commodities, undoubtedly has an impact on a country's economy.

The price of cocoa futures fluctuates, which adds an element of uncertainty to the cocoa stakeholder. Cocoa futures are a tool for price discovery as well as hedging the cocoa market. Predicting cocoa futures prices could assist stakeholders such as cocoa farmers and investors in making decisions. As a result, in this study, we build a model that can predict the price of cocoa futures using MLP and LSTM algorithms and data from https://www.investing.com/commodities/us-cocoa from 2003 to 2021. Based on the findings of the research, it is possible to conclude that price prediction of cocoa futures using MLP and LSTM is feasible and produces excellent results with a high percentage of accuracy. The obtained results were able to be so close to the actual value. The model produces the best results when the input layer has 64 neurons, the hidden layer has 64 neurons, the output layer has 1 neuron, the epochs are 500, and the batch sizes are 64 and 128.Both batch size gives a great result and is also able to optimize LSTM in terms of training time. The best model developed from MLP and LSTM can make predictions by applying errors such as RMSE, MAE, and MAPE. Using these evaluation matrices, the LSTM model with a 50-50 Train and Test data ratio, 128 Batch size, and 64 Neurons produced the best model with the highest Accuracy. The model has an RMSE of 2.27, MAE of 32.11, and MAPE of 1.29, for a total accuracy of 98.71 percent. The LSTM architecture outperforms the MLP because it can memorize the output and use it as an input again to achieve the best output. The findings of this study are expected to assist future researchers in making better predictions by using a much larger dataset to help the models learn much broader and to make the models capable of forecasting the market on a different trend.

REFERENCES

[1] G. Arburn and L. Harper, “Derivatives Markets And Managed Money: Implications For Price Discovery,” Int. J. Bus.

Financ. Res., vol. 13, no. 1, pp. 53–61, 2019.

[2] M. Mustafa and D. Andriyani, “PENGARUH EKSPOR IMPOR KAKAO DAN KARET TERHADAP CADANGAN DEVISA DI INDONESIA,” J. Ekon. Pertan. Unimal, vol. 3, no. 2, p. 34, Dec. 2020, doi: 10.29103/jepu.v3i2.3189.

[3] Akhsan, M. Arsyad, A. Amiruddin, M. Salam, Nurlaela, and M. Ridwan, “In-Depth Study of Multiple Cropping Farming Systems: The Impact on Cocoa Farmers’ Income,” AGRIVITA J. Agric. Sci., vol. 44, no. 2, pp. 355–365, Jun. 2022, doi:

10.17503/agrivita.v44i2.3761.

[4] M. Trisanti Saragih, H. Harianto, and H. Kuswanti, “Pengaruh Penerapan Bea Keluar Biji Kakao Terhadap Daya Saing Serta Ekspor Produk Kakao Indonesia,” Forum Agribisnis, vol. 11, no. 2, pp. 133–152, Sep. 2021, doi:

10.29244/fagb.11.2.133-152.

[5] P. Nareswari, S. S. Wibowo, P. Nareswari, and S. Wibowo, “Global and Local Commodity Prices: A Further Look at the Indonesian Agricultural Commodities,” Cap. Mark. Rev., vol. 28, no. 1, pp. 65–76, 2020.

[6] K. Sukiyono et al., “Selecting an Accurate Cacao Price Forecasting Model,” in Journal of Physics: Conference Series, Dec. 2018, vol. 1114, no. 1, p. 012116. doi: 10.1088/1742-6596/1114/1/012116.

[7] H. Ouyang, X. Wei, and Q. Wu, “Agricultural commodity futures prices prediction via long- and short-term time series network,” J. Appl. Econ., vol. 22, no. 1, pp. 468–483, Jan. 2019, doi: 10.1080/15140326.2019.1668664.

[8] A. B. Aribowo, D. Sugiarto, I. A. Marie, and J. F. A. Siahaan, “Peramalan harga beras IR64 kualitas III menggunakan

(9)

Abbsumarmanali Firyabi Sakhtiyani, Copyright © 2022, MIB, Page 1890 metode Multi Layer Perceptron, Holt-Winters dan Auto Regressive Integrated Moving Average,” Ultim. J. Tek. Inform., vol. 11, no. 2, pp. 60–64, Jan. 2020, doi: 10.31937/ti.v11i2.1246.

[9] R. Murugesan, E. Mishra, and A. H. Krishnan, “Deep Learning Based Models: Basic LSTM, Bi LSTM, Stacked LSTM, CNN LSTM and Conv LSTM to Forecast Agricultural Commodities Prices.” Research Square, 2021. doi:

10.21203/rs.3.rs-740568/v1.

[10] R. Hyndman and G. Athanasopoulos, Forecasting: Principles and Practice, 2nd ed. Australia: OTexts, 2018.

[11] R. Casado-Vara, A. Martin del Rey, D. Pérez-Palau, L. De-la-Fuente-Valentín, and J. M. Corchado, “Web Traffic Time Series Forecasting Using LSTM Neural Networks with Distributed Asynchronous Training,” Mathematics, vol. 9, no. 4, p. 421, Feb. 2021, doi: 10.3390/math9040421.

[12] H. Park, “MLP modeling for search advertising price prediction,” J. Ambient Intell. Humaniz. Comput., vol. 11, no. 1, pp. 411–417, 2020, doi: 10.1007/s12652-019-01298-y.

[13] M. Moocarme, M. Abdolahnejad, and R. Bhagwat, The Deep Learning with Keras Workshop. Birmingham: Packt, 2020.

[Online]. Available: https://www.packtpub.com/product/the-deep-learning-with-keras- workshop/9781800562967?_ga=2.203383377.1142999647.1658296011-424466472.1657951905

[14] G. Zhou, H. Moayedi, M. Bahiraei, and Z. Lyu, “Employing artificial bee colony and particle swarm techniques for optimizing a neural network in prediction of heating and cooling loads of residential buildings,” J. Clean. Prod., vol.

254, p. 120082, May 2020, doi: 10.1016/j.jclepro.2020.120082.

[15] N. Golenvaux, P. G. Alvarez, H. S. Kiossou, and P. Schaus, “An LSTM approach to Forecast Migration using Google Trends,” May 2020, doi: 10.48550/arxiv.2005.09902.

[16] X. Song et al., “Time-series well performance prediction based on Long Short-Term Memory (LSTM) neural network model,” J. Pet. Sci. Eng., vol. 186, p. 106682, Mar. 2020, doi: 10.1016/j.petrol.2019.106682.

[17] Q. Chen, W. Zhang, and Y. Lou, “Forecasting Stock Prices Using a Hybrid Deep Learning Model Integrating Attention Mechanism, Multi-Layer Perceptron, and Bidirectional Long-Short Term Memory Neural Network,” IEEE Access, vol.

8, pp. 117365–117376, 2020, doi: 10.1109/ACCESS.2020.3004284.

[18] A. Botalb, M. Moinuddin, U. M. Al-Saggaf, and S. S. A. Ali, “Contrasting Convolutional Neural Network (CNN) with Multi-Layer Perceptron (MLP) for Big Data Analysis,” Nov. 2018. doi: 10.1109/ICIAS.2018.8540626.

[19] R. N. Ihsan, S. Saadah, and G. S. Wulandari, “Prediction of Basic Material Prices on Major Holidays Using Multi-Layer Perceptron,” J. MEDIA Inform. BUDIDARMA, vol. 6, no. 1, p. 443, Jan. 2022, doi: 10.30865/mib.v6i1.3508.

[20] M. A. Ghorbani, R. C. Deo, V. Karimi, M. H. Kashani, and S. Ghorbani, “Design and implementation of a hybrid MLP- GSA model with multi-layer perceptron-gravitational search algorithm for monthly lake water level forecasting,” Stoch.

Environ. Res. Risk Assess., vol. 33, no. 1, pp. 125–147, Jan. 2019, doi: 10.1007/s00477-018-1630-1.

[21] A. Sherstinsky, “Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network,”

Phys. D Nonlinear Phenom., vol. 404, p. 132306, Mar. 2020, doi: 10.1016/j.physd.2019.132306.

[22] D. Wei, “Prediction of Stock Price Based on LSTM Neural Network,” in Proceedings - 2019 International Conference on Artificial Intelligence and Advanced Manufacturing, AIAM 2019, Oct. 2019, pp. 544–547. doi:

10.1109/AIAM48774.2019.00113.

[23] H. Chung and K. Shin, “Genetic Algorithm-Optimized Long Short-Term Memory Network for Stock Market Prediction,”

Sustainability, vol. 10, no. 10, p. 3765, Oct. 2018, doi: 10.3390/su10103765.

[24] S. Siami-Namini, N. Tavakoli, and A. S. Namin, “The Performance of LSTM and BiLSTM in Forecasting Time Series,”

in Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019, Dec. 2019, pp. 3285–3292. doi:

10.1109/BigData47090.2019.9005997.