Prediction of Rainfall Classification of Java Island with ANN-Feature Expansion and Ordinary Kriging
Irfani Adri Maulana, Sri Suryani Prasetiyowati, Yuliant Sibaroni School of Computing, Informatics, Telkom University, Bandung, Indonesia
Email: 1,*[email protected], 2[email protected], 3[email protected] Correspondence Author Email: [email protected]
Abstract−Precipitation is one of the most important climatic variables in many aspects of our daily lives. High rainfall intensity can cause floods, landslides, and other natural disasters. Therefore, rainfall prediction is important for predicting natural disasters, assisting farmers in production decisions, and crop harvesting. In this research, a system is built to create a rainfall prediction map using a machine learning approach and spatial interpolation algorithms in Java, Indonesia. In the field of weather prediction, the artificial neural network approach is a popular machine learning method. The artificial neural network (ANN) method is a method that has the advantage of studying connections in the previously unknown hidden layer between input data and output data through training procedures. By using the ANN method, historical weather and climate data can be applied to create a classification model and predict rainfall classes. The classification of data is determined based on the attributes of historical weather and climate data, namely temperature, humidity, air pressure, evaporation, sunlight, and the level of rainfall in the time range per day and month. From the results of the ANN modeling, it was found that the 5C month model with an accuracy value of 89% as the best monthly ANN model, and the 6C day model with an accuracy value of 81% as the best daily ANN model. After going through ANN modeling, there is a spatial interpolation algorithm that is often used to estimate rainfall, namely Ordinary Kriging. The Ordinary Kriging approach is used to reduce the estimated variance and estimate the rainfall value in the case study area. After going through Ordinary Kriging modeling, a rainfall prediction map for the next six months and seven days is made based on the coordinates as a result of the research. The results of this research are rainfall prediction maps for the next six months and the next seven days on Java Island.
Keywords: Prediction Map; Classification; Rainfall; Artificial Neural Networks; Ordinary Kriging
1. INTRODUCTION
Rainfall is one of the most important climatic variables in many aspects of our daily life [1]. Rainfall has a major impact on several aspects, including agriculture, development, water resources, and industry. Excessive rain generally causes floods and landslides, as well as natural disasters. Rainfall has the highest correlation with adverse natural disasters [2, 3]. Nonetheless, rainfall has been shown to assist farmers in improving crop management through the use of historical rainfall data, which has benefited the nation's economy [4]. Therefore, it is important to predict rainfall to prevent natural disasters that can endanger people's lives and property, as well as assist farmers in making crop production and harvest decisions. Historical data from weather and climate is collected and analyzed to make it predictable. Data such as temperature, humidity, air pressure, evaporation, sunlight, and rainfall levels are used to predict rainfall [5, 6].
Many types of research on rainfall prediction have been carried out before, in 2018 a study was conducted to predict rainfall using a combined model (Artificial Neural Network) ANN and Support Vector Machine (SVM) based on Ensemble Empirical Mode Decomposition (EEMD) [6]. A comparison experiment was conducted between the ANN and SVM methods by making predictions for each Intrinsic Mode Function (IMF) in three places using the month attribute and the percentage of variance as parameters. The experimental results obtained indicate that the prediction performance using ANN is better than SVM with a probability of 79.2%. In the journal [7], a study was conducted to predict monthly rainfall using the Artificial Neural Network (ANN) method in a case study in Mashhad, Iran. In this study, it was found that the ANN model can produce good predictive performance.
The results showed that the ANN structure, algorithm, activation function, and the number of different epochs can affect the process of finding the right model to be difficult. The best structure to achieve is a three-layer feed- forward perceptron with a backpropagation algorithm in the form of M741 (ANN structure with seven input layers, four hidden layers, and one output layer). In the M741 structure, prediction performance is assessed by different statistical criteria such as R, RMSE, and MAE. For the M741 model, the parameters obtained are 0.93, 0.99, and 6.02 mm, respectively. In similar studies [8,9,10], the ANN model was used to predict rainfall and got good prediction results. It was found that using the ANN model in one similar study resulted in an accuracy value of 98%.
In research [11], The Artificial Neural Network (ANN) approach was utilized in this study to create one- month and two-month forward forecasting models for rainfall prediction using Northern India's monthly rainfall data. Feed Forward Neural Network (FFNN) with Back Propagation Algorithm and Levenberg-Marquardt training function was utilized in this model. The performance of both models was evaluated using Regression Analysis, Mean Square Error (MSE), and Magnitude of Relative Error (MRE). For both forecasting models, the suggested ANN model produced optimistic findings. It demonstrates that ANN worked better for the M1 model (3-25-1) than the M2 model (3-50-1), indicating that the ANN approach performed better for one month than two months ahead. In addition to predicting rainfall using ANN, there is also a similar study as in [5], using the LSTM method for predicting monthly rainfall over the Karnataka subdivision. The results show that the LSTM optimized deep
learning technique shows better predictive outcomes. The LSTM model shows better performance with minimum Mean Absolute Percentage Error (0.79) and Root Mean Squared Error (1.35) for prediction.
Several other studies have also used an interpolation algorithm approach such as Ordinary Kriging. In 2022 [12], a study was conducted to compare and assess the efficacy of well-known interpolation algorithms for predicting monthly rainfall data in Thailand. The Inverse Distance Weighting (IDW), Inverse Exponential Weighting (IEW), Multiple Linear Regression (MLR), Artificial Neural Networks (ANN), and Ordinary Kriging are among the methods chosen. For several of the aforementioned designs, the approach of looking for nearby stations is also imposed. The k-fold cross-validation approach is utilized to evaluate the efficacy of each method, and then the metric scores, RMSE, and MAE are compared. According to statistical metrics used in the study, the OK generates the most accurate estimation. In [13], a study was conducted to use Deep Neural Network (DNN) and Ordinary Kriging (OK) methods to simulate semivariograms with increased data used as case studies. A comparison was also made between the results of the OK interpolation with the results of the Exponential and Gaussian models to verify the effectiveness and rationality of the proposed method. From the research results, DNN can be used to fulfill all functions. The advantage of this method is that it can optimize most of the other semivariogram functions and the process of analyzing various semivariogram functions. In [14], research on Ordinary Kriging (OK) and genetic programming for spatial estimation of rainfall was carried out. The purpose of this study was to compare the performance of the traditional kriging method, Kriging Genetic Programming (KGP), and Inverse Distance Weighting (IDW) to produce a high-quality grid rainfall data set for the study area in the form of rainfall maps. The results of the cross-validation show that the kriging-based stochastic interpolation method (OK and KGP) outperforms the deterministic method (IDW).
According to previous research, the ANN methodology is suitable for rainfall prediction. We can observe from studies [7, 8, 9, 10] that ANN can produce good predictive performance. In addition, the Ordinary Kriging approach can also be a suitable method for interpolating. In [12, 13, 14], it is known that the Ordinary Kriging approach to being a spatial interpolation algorithm can give good results. The difference between this research and related research is that this research was conducted on a case study of Java Island using data from the Meteorology, Climatology, and Geophysics Agency.
In writing this research, historical weather data such as temperature, humidity, air pressure, sunlight, and rainfall are used as input. The method used in this research is to create a predictive map based on rainfall classification, namely the ANN machine learning method and the Ordinary Kriging spatial interpolation method.
2. RESEARCH METHODOLOGY
2.1 System Design
There are several stages in making the system built in this study. The system design that was built was made using a flow chart which can be seen in Figure 1.
Figure 1. Flowchart of The System Built
The system built consists of historical weather and climate data at 27 location points on the island of Java from January 1, 2010, to March 31, 2022. The historical data is processed into several classification models using the Artificial Neural Network (ANN) method, then evaluated by looking for accuracy. of each model to get the best model. After getting the best classification model, the Ordinary Kriging approach is used as an interpolation algorithm to produce a rainfall prediction map.
2.2 Dataset
The dataset used in this research is historical weather and climate data obtained from the Meteorology, Climatology, and Geophysics Agency at 27 location points on Java Island from January 1, 2010, to March 31, 2022. The dataset consists of several attributes, namely: temperature (°C), humidity (%), rainfall (mm), duration of sunshine (hours), wind speed (m/s), and wind direction (°).
In making the model, the data is divided into two types according to different timescales. The first type of data is a dataset according to the time range per day, while the second type of data is a dataset according to the time range per month.
2.3 Data Preprocessing
The data preprocessing step is crucial for narrowing the list of labels to a manageable number. Data preprocessing can be done by modifying and scaling the entire dataset. It aims to provide process data train and test data. The preprocessing stage is needed before training the machine learning model. During preprocessing, the data is cleaned and transformed to remove outliers and empty data [14, 15].
2.4 Artificial Neural Network
The Artificial Neural Network (ANN) method is one of the artificial neural network methods that involves computation and mathematics that simulates the processes of the human brain. The architectural format of the ANN model is influenced by the biological nervous system. The ANN model functions like how the brain works which consists of a complex and non-linear network of neurons. The network of neurons is interconnected by weighted connections. Learning and training methods are used to calculate all processes in the ANN model, including data collection and analysis, network structure design, hidden layers, and network simulation [16].
Figure 2. Artificial Neural Network Architecture [7]
Figure 2 is a description of the ANN architecture containing three layers. In this ANN model, the synapse weight, bias, activation function in the hidden layer, and output function in the layer determine the relationship between the input neuron and its output [7]. This relationship can be seen in the formula below (1):
𝑦 = 𝑔 [[∑𝑛𝑗=1𝑤𝑘𝑗𝑓(∑𝑛𝑖=1(𝑤𝑗𝑖𝑥𝑖+ 𝑏)) + 𝑏]] (1) where 𝑤𝑗 is the weight of the synapses in the hidden layer; 𝑤𝑘 is the synapse weight on the output layer; 𝑏 is biased; 𝑓(𝑥) is the activation function in the hidden layer; 𝑔(𝑥) is the output function on the output layer; 𝑥𝑖 is the input neuron; and 𝑦𝑖 is the output neuron.
2.5 Model Evaluation
After going through the modeling stage using the ANN method, the rainfall classification data was obtained. From the classification data obtained, an evaluation of the model is carried out to determine the performance of the model used. Model evaluation is done using statistical measurement metrics, namely Accuracy, Precision, Recall, and F1-Score. The following is an explanation and the measurement metric formula used [17]:
a. Accuracy is one of the most commonly used metrics when classifying. The accuracy of the model can be projected using the formula given below (2).
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑁+𝑇𝑃
𝑇𝑁+𝐹𝑃+𝐹𝑁+𝑇𝑃 (2)
b. Precision shows how accurately the model predicts positive values. So that it can measure the accuracy of the predicted positive results. Precision is also known as positive predictive value. Precision can be projected using the formula below (3).
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃
𝑇𝑃+𝐹𝑃 (3)
c. Recall is useful for measuring the strength of a model to predict positive results. Recall can also be known as model sensitivity. Recall can be projected using the formula below (4).
𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑃
𝑇𝑃+𝐹𝑁 (4)
d. The F1-Score is a commonly used metric in classification settings. F1-Score is calculated using a weighted harmonic average between precision and recall. The F1-Score uses a precision score and a classifier recall score. F1-Score can be projected using the below formula (5).
𝐹1= 2 ∗𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑟𝑒𝑐𝑎𝑙𝑙
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑟𝑒𝑐𝑎𝑙𝑙 (5)
2.6 Semivariogram
Semivariogram is a tool that can describe, model, and calculate the spatial autocorrelation of the measured sample points. The size of a semivariogram value is equal to half of the variogram value [18]. The semivariogram formula can be projected as follows (6):
𝛾(ℎ) =1
2𝐸[𝑍(𝑆) − 𝑍(𝑠 + ℎ)]2 (6)
An experimental variogram is a predictive value obtained from sample experiments in the field. Predictions are made based on the correlation value between two variables that have a certain distance [18]. The difference with the experimental semivariogram is that the size value in the experimental semivariogram is equal to half of the experimental variogram value. The following experimental semivariogram formula can be projected as follows (7):
𝛾(ℎ) = 1
2𝑁(ℎ)∑[𝑍(𝑠𝑖) − 𝑍(𝑠𝑖+ ℎ)]2 (7)
Where 𝑠 is the location of the sample point; 𝑍(𝑠) is the observation value at the location; and ℎ is the distance between the two sample points. To find the semivariogram value, there are several data pairs that are divided into classes using the sturge equation as follows (8):
𝑘 = 1 + 3,3𝑙𝑜𝑔𝑛 (8)
Where 𝑘 is the number of class intervals; and 𝑛 is the sample size. After obtaining experimental semivariogram values, parameters for theoretical semivariogram calculations can be calculated [18]. Several parameters are used to find the value in the theoretical semivariogram:
a. Nugget Effect (𝐶0): It is an approximation of the semivariogram value at a distance that is close to zero.
b. Range (𝑎): The distance when the semivariogram reaches the sill value.
c. Sill (𝐶0 + 𝐶): When the semivariogram value tends to be stable. The sill value is the same as the variance value of the spatial data.
After obtaining the values of the three parameters above, the theoretical semivariogram value was calculated. Furthermore, which model has the smallest size value to estimate the data [18]. The following are some theoretical semivariogram models used as a comparison (9, 10, 11):
a. Spherical Model:
𝛾(ℎ) = {𝐶0+ 𝐶[(3ℎ2𝑎) − 0,5(ℎ𝑎)3] , ℎ ≤ 𝑎
, ℎ > 𝑎 (9)
b. Exponential Model:
𝛾(ℎ) = 𝐶0+ 𝐶 [1 − 𝑒𝑥𝑝 (−3ℎ
𝑎)] (10)
c. Gaussian Model:
𝑦(ℎ) = 𝐶0+ 𝐶 [1 − 𝑒𝑥𝑝 (−3ℎ2
𝑎2 )] (11)
Where 𝛾(ℎ) is the theoretical semivariogram; 𝐶0 + 𝐶 is the sill, the semivariogram value for the distance when the magnitude is constant; ℎ is the distance to the sample location; and 𝑎 is the range, the distance when the semivariogram value reaches the sill.
To determine the best theoretical semivariogram model, an error statistical tester, namely Root Mean Square Error (RMSE) was applied [19]. The error statistic tester and its computational formula are projected as follows (13):
𝑅𝑀𝑆𝐸 = √1
𝑛∑𝑛𝑡=1𝑟𝑡2 (12)
2.7 Ordinary Kriging
Ordinary Kriging is a popular interpolation approach in geostatistics. Ordinary Kriging is one of the best linear unbiased estimators because it can minimize estimation errors. A geographic correlation range is determined, and interpolation between data points is performed based on sampling points within the range. Semivariogram is used to provide an unbiased and optimal estimate of attribute values within a limited range and is used to spatially correlate variables within a certain range when combined with correlation analysis. As a result, kriging produces an unbiased and optimal linear estimate for the unknown data point based on real data for variables within a given range and structural properties of the semivariogram. The biggest advantage of kriging over other interpolation methods is the reduction in computational variance [12, 13]. In this study, Ordinary Kriging is used for spatial interpolation of point rainfall, which can be seen in formula (13) [14]:
𝜃𝑂𝐾∗(𝑥0)=∑𝑛𝑖=1𝜔𝑖𝑂𝐾𝜃(𝑥𝑖)𝑤𝑖𝑡ℎ∑𝑛𝑖=1𝜔𝑖𝑂𝐾= 1 (13)
Where 𝜃𝑂𝐾∗(𝑥0) is the estimated variable 𝜃 (rainfall in this study) at the target position 𝑥0; 𝜔𝑖𝑂𝐾 indicates the kriging weight associated with the sample location 𝑥𝑖 corresponding to 𝑥0; and 𝑛 is the number of sampling points.
3. RESULT AND DISCUSSION
3.1 Data Categories
The datasets in this study were divided into six categories. This category aims to form a class of rainfall data (RR) that falls to the surface. The division of data categories can be seen in Table 1.
Table 1. Data Categories
Data Range Categories Class
RR ≥ 0 Cloudy 0
0.5 < RR ≤ 20 Light rain 1
20 < RR ≤ 50 Moderate rain 2
50 < RR ≤ 100 Heavy rain 3
100 < RR ≤ 150 It's raining very hard 4
RR > 150 Extreme rain 5
3.2 Data Time-Series
After going through the categorization, the data is transformed into the form of time-series data. The data formation is made by giving the time range as input and the year as the boundary between train data and test data. The time range used as input in the monthly model is two months to six months, while the time range used as input in the daily model is three days to seven days. The year that is used as the boundary between the train data and the test data is filled with 2020. So that the period of the training data is 2010 to 2019 and the period of the test data is 2020 to 2022, to be exact, March 31, 2022, for each location.
In the form of time-series data, each row contains a location id and several data attributes at that location as a feature, as well as a rainfall class as a target. The number of attributes used as features depends on the time range entered. So, we get a formula like the following (14):
𝑛𝑎𝑡𝑟= 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒 ∗ 𝑛𝑡𝑖𝑚𝑒 (14)
Where 𝑛𝑎𝑡𝑟 is the number of attributes in the row; 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒 is the total attribute in the dataset; and 𝑛𝑡𝑖𝑚𝑒 is inputted time range.
3.3 Data Split
Data splitting is done to divide the form of time-series data into several parts. It aims to prepare several models before entering the ANN method. Split data is done to divide all data into five parts, namely A, B, C, D, and the combination of all data. More details can be seen in Table 2:
Table 2. Data Splitting
Model Data Range
A The first quarter of index data B The second quarter of index data C The third quarter of index data D The fourth quarter of index data
Model Data Range
Combined All of the data
3.4 Selection Feature
After obtaining attributes for all data models, feature selection is carried out to avoid multicollinearity, or conditions where several independent variables in a model are correlated. Therefore, feature selection is needed to remove irrelevant and redundant variables. Feature selection can reduce the dimension of the variable which not only reduces the resources for storing and processing data but also increases the interpretability of the selected variable [20].
3.5 Implementation of ANN Method
The ANN method is implemented to obtain a classification model. To create a model, first, define the hyperparameters. The determination of hyperparameters starts from the configuration of the layers in the model.
The layers in the model consist of an input layer, a hidden layer, and an output layer.
At the input layer, it is necessary to determine the size of the input to receive the data used. For each hidden layer, neuron needs to be determined along with its activation function. In the output layer, a loss function must also be defined which is used as a reference for updating model parameters during training.
For the ANN model, the number of input layers depends on the number of features used so that the number of input layers in each model is different according to the number of data features, in the first hidden layer there are 64 neurons, in the second hidden layer there are 32 neurons, and in the output layer there are six neurons. such as the number of rainfall classes. The activation function used is relu and the model optimization algorithm used is adam. In addition, to avoid overtraining, provisions for early stopping are also chosen.
3.6 Evaluation of ANN Model
After going through the modeling using the ANN method, an evaluation of the model was carried out using statistical metric measurements, namely Accuracy, Precision, Recall, and F1-Score. Model evaluation is carried out to get the best ANN model on a monthly and daily timeframe. The first evaluation carried out is to compare the accuracy of the model features that have been obtained using feature selection. This is done to find the best combination of attributes in a model according to the accuracy obtained. Some of the results from the comparison of the model's feature accuracy values can be seen in Figure 3.
Figure 3. Some Comparison of Model Feature Accuracy Values
After the first evaluation is applied to each model, the ANN model with the highest combination of attributes is obtained. Furthermore, comparisons were made between the best monthly ANN models, namely the two-month model to the six-month model, as well as comparisons between the best daily ANN models, namely the three-day model to the seven-day model. Comparison of monthly and daily ANN model accuracy values can be seen in Table 3.
Table 3. Accuracy on ANN Models
Model Model Data
A B C D Combined
2 Month Model 0.8351 0.75 0.8655 0.7039 0.7238
3 Month Model 0.7937 0.8449 0.8566 0.7306 0.7609
4 Month Model 0.8233 0.8255 0.8470 0.7560 0.7691
5 Month Model 0.7849 0.8673 0.8917 0.8035 0.8060
6 Month Model 0.8139 0.8478 0.8791 0.7845 0.8211
3 Day Model 0.7804 0.7536 0.8003 0.7497 0.7713
4 Day Model 0.7813 0.7543 0.7991 0.7494 0.7771
Model Model Data
A B C D Combined
5 Day Model 0.7794 0.7553 0.8064 0.7441 0.7831
6 Day Model 0.7862 0.7588 0.8110 0.7421 0.7799
7 Day Model 0.7799 0.7513 0.8014 0.7510 0.7763
From Table 3, the best ANN model is obtained through comparisons between monthly ANN models and between daily ANN models. The best monthly ANN model is the 5C model with 29 attributes and an accuracy value of 0.891696751. The best daily ANN model is the 6C model with 16 attributes and an accuracy value of 0.810972412. For more details, the following are the results of statistical metrics on the 5C month model and 6C day model which can be seen in Table 4.
Table 4. Statistical Metrics on the Best ANN Models
Model Accuracy Precision Recall F1-Score
5 Month Model C 0.8917 0.8695 0.8700 0.8680
6 Day Model C 0.8110 0.7760 0.8025 0.7851
From the ANN implementation, it is known that the 5C month model, the model with five months input and one month output with data taken from the third quarter, is the best monthly model. It is known that the 6C day model, the model with six days of input and one day of output with data taken from the third quarter, is the best daily model. The combination of attributes in the 5C month model and 6C day model can be seen in Table 5.
Table 5. Attribute Combination in the Best ANN Models
Model Attribute Combination
5 Month Model C
Tn_1, Tn_4, Tn_5, Tx_3, Tx_4, Tavg_3, Tavg_4, RH_avg_1, RH_avg_2, RH_avg_3, RH_avg_4, RH_avg_5, RR_1, RR_3, RR_4, RR_5, ss_1, ss_2, ss_3, ss_4, ss_5, ff_x_5, ddd_x_1, ddd_x_2, ddd_x_4,
ddd_x_5, ff_avg_1, ff_avg_4, ff_avg_5
6 Day Model C RH_avg_1, RH_avg_2, RH_avg_3, RH_avg_4, RH_avg_5, RH_avg_6, RR_6, ss_1, ss_2, ss_3, ss_4, ss_5, ss_6, ff_avg_4, ff_avg_5, ff_avg_6
After getting the best ANN model, the best monthly ANN model predicts the rainfall class for the next six months and the best daily ANN model predicts the rainfall class for the next seven days.
3.7 Implementation of Ordinary Kriging Method
In this study, after obtaining a classification model using the ANN method, the Ordinary Kriging spatial interpolation method was applied to perform a spatial estimate of monthly and daily rainfall at 27 locations on the island of Java, Indonesia. To perform spatial estimation, the coordinates of the location points are added at each location, namely longitude and latitude data (latitude and longitude). Therefore, the data used contains location id, location name, coordinate point, and predicted rainfall class.
In applying the Ordinary Kriging spatial method, a semivariogram model is needed to support the estimation of rainfall. There are three semivariogram models used in this research as a comparison, namely the spherical model, the exponential model, and the gaussian model. In addition to determining the semivariogram model, it is necessary to determine the values of the parameters for theoretical semivariogram calculations. The parameters that need to be specified are range, sill, and nugget. After determining the semivariogram model and its parameters, the Ordinary Kriging method can be applied.
3.8 Evaluation of Semivariogram Model
After performing the spatial estimation of rainfall using the Ordinary Kriging method, the semivariogram model was evaluated using the Root Mean Square Error (RMSE) metric. Model evaluation is carried out to get the best semivariogram model on data that has gone through spatial estimation. The comparison of the semivariogram model based on the RMSE value can be seen in Tables 6 and 7.
Table 6. Comparison of RMSE Metrics in Monthly Semivariogram Model
Month Semivariogram Model Spherical Exponential Gaussian
April 0.5479 0.5489 0.5470
May 0.5722 0.5711 0.5519
June 0.5607 0.5569 0.5617
July 0.5366 0.5367 0.5348
August 0.5450 0.5765 0.5581
September 0.6133 0.6156 0.6328
Table 7. Comparison of RMSE Metrics in Daily Semivariogram Model
Date Semivariogram Model Spherical Exponential Gaussian
1 Apr 2022 0.5781 0.5772 0.5877
2 Apr 2022 0.5394 0.5472 0.5365
3 Apr 2022 0.5561 0.5607 0.5602
4 Apr 2022 1.2104 1.2104 1.2792
5 Apr 2022 0.3513 0.3689 0.3327
6 Apr 2022 0.3218 0.3321 0.3244
7 Apr 2022 0.4466 0.4589 0.4310
From the comparison results in Tables 6 and 7, it can be seen that the Gaussian semivariogram model is the best in this case study. In the semivariogram model applied to the monthly model, it was found that the Gaussian semivariogram model was the dominant model in having the lowest RMSE value. While the semivariogram was applied to the daily model, it was found that the dominant model in having the lowest RMSE value was the Spherical and Gaussian model, but the lowest RMSE value in the daily model was generated by the Spherical model obtained from the data on April 6, 2022.
3.9 Rainfall Map Results
In this research, historical weather and climate data are used to make prediction maps based on rainfall classification, using the ANN machine learning method and Ordinary Kriging spatial interpolation method. After going through the application of the ANN method to obtain a classification model, then through the application of the Ordinary Kriging method to perform the spatial interpolation. The last step is to display the results of the rainfall map.
To display a map of rainfall on Java Island shapefile data is needed as a description of the Java Island map.
After having data on Java Island, ArcMap software is used to do the mapping. The results of the Ordinary Kriging method's spatial interpolation are clipped with data from Java to show a rainfall prediction map on the island of Java. Figure 4 and 5 below shows the result of the monthly and daily rainfall maps.
(a) (b)
(c) (d)
(e) (f)
Figure 4. Rainfall Monthly Prediction Map (a) April (b) May (c) June (d) July (e) August (f) September
(a) (b)
(c) (d)
(e) (f)
(g)
Figure 5. Rainfall Daily Prediction Map (a) 1st April (b) 2nd April (c) 3rd April (d) 4th April (e) 5th April (f) 6th April (g) 7th April
Figures 4 and 5 show that the pink color on the map has the highest probability of rain prediction, while the green color on the map predicts the cloudy area with the lowest probability of rain. In addition, several maps produce the value of a moderate rain class, namely class 2, even approaching the value of an extreme rain class, namely class 5.
It can be seen in April Figure 4 (a) that there is a relatively large pink area on the map of Java in the east which means it is likely to rain. In May Figure 4 (b) 2 areas appear to have the greatest possibility of rain, namely Bogor and its surroundings, and also at Nganjuk and its surroundings. In June Figure 4 (c) there are no areas that have a high probability of rain. In July Figure 4 (d) there is a relatively large pink area on the map of Java in the west. In August Figure 4 (e) 2 areas are seen to have the possibility of light rain to moderate rain, namely Bogor and its surroundings, as well as Pasuruan, Malang, and Banyuwangi. In September Figure 4 (f) several areas appear to have the possibility of light rain to moderate rain.
As of April 1, Figure 5 (a), pink areas are shown on the map from central Java to the east, which means it has the possibility of light rain. On April 2, Figure 5 (b), many areas have the possibility of rain, namely the areas that are orange, brown, and pink. On April 3, Figure 5 (c), many areas have the possibility of rain, namely the areas that are orange, brown, and pink. On April 4th Figure 5 (d), most areas have a low probability of rain which is marked in green, but there are areas with a possibility of heavy to extreme rain in Sumenep and its surroundings.
On April 5, Figure 5 (e), many areas have the possibility of rain, namely the brown and pink areas. On April 6, Figure 5 (f), several areas have the possibility of rain, namely the pink area. On April 7, Figure 5 (g), Central Java has the possibility of light rain.
4. CONCLUSION
In this research, a study was conducted to create a rainfall prediction map using the ANN machine learning approach and the Ordinary Kriging spatial interpolation algorithm. Before applying the ANN method, doing data split and feature selection during preprocessing are quite important actions. This is because the action on split data and feature selection might reduce the potential of multicollinearity while simultaneously increasing the model's accuracy. Furthermore, the activity of determining the hyperparameters when generating the ANN model is critical. As a result, the ANN model can generate the two best monthly and daily models, namely the 5C month model and the 6C day model. The ANN model's accuracy is 89 percent for the monthly ANN model and 81 percent for the daily ANN model. After creating the ANN model, the predicted data is applied using the Ordinary Kriging approach. The semivariogram model and its parameters are set in advance before using the Ordinary Kriging technique. This is because the semivariogram model and its parameters affect the findings of the Ordinary Kriging rainfall estimation technique. The semivariogram model was then analyzed using the RMSE metric to determine which model had the least error value. As a consequence, the Gaussian model is determined to be the best semivariogram model for this case study since it has the lowest RMSE value. The results of this research are rainfall prediction maps for the next six months and rainfall prediction maps for the next seven days from the final dataset used, March 31, 2022. The monthly prediction map result shows that in April and July, Most of Java Island is predicted to experience light rain. In May, August, and September, several areas on Java Island are predicted to
experience light rain to moderate rain. In June, Java Island is predicted to experience sunny and cloudy days. On the daily prediction map, it shows that on the 1st of April, most of Central Java is predicted to experience light rain. On the 2nd, 3rd, and 4th of April, Java Island is predicted to experience light rain to moderate rain. On the 4th of April, most of Java Island has a low probability of rain, but there are areas with a possibility of heavy to extreme rain in Sumenep and its surroundings. On the 6th of April, several areas of Java Island are predicted to experience light rain. On the 7th of April, Central Java is predicted to experience light rain.
REFERENCES
[1] A. Y. Barrera-Animas, L. O. Oyedele, M. Bilal, T. D. Akinosho, J. M. D. Delgado, and L. A. Akanbi, “Rainfall prediction:
A comparative analysis of modern machine learning algorithms for time-series forecasting,” Machine Learning with Applications, vol. 7, p. 100204, 2022, doi: https://doi.org/10.1016/j.mlwa.2021.100204.
[2] D. Z. Haq et al., “Long Short-Term Memory Algorithm for Rainfall Prediction Based on El-Nino and IOD Data,” Procedia Computer Science, vol. 179, pp. 829–837, 2021, doi: https://doi.org/10.1016/j.procs.2021.01.071.
[3] W. M. Ridwan, M. Sapitang, A. Aziz, K. F. Kushiar, A. N. Ahmed, and A. El-Shafie, “Rainfall forecasting model using machine learning methods: Case study Terengganu, Malaysia,” Ain Shams Engineering Journal, vol. 12, no. 2, pp. 1651–
1663, 2021, doi: https://doi.org/10.1016/j.asej.2020.09.011.
[4] C. Z. Basha, N. Bhavana, P. Bhavya, and S. V, “Rainfall Prediction using Machine Learning & Deep Learning Techniques,” in 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC), 2020, pp. 92–97. doi: 10.1109/ICESC48915.2020.9155896.
[5] P. Kanchan, “Rainfall Analysis and Forecasting Using Deep Learning Technique,” Journal of Informatics Electrical and Electronics Engineering (JIEEE), vol. 2, pp. 1–11, Jul. 2021, doi: 10.54060/JIEEE/002.02.015.
[6] Y. Xiang, L. Gou, L. He, S. Xia, and W. Wang, “A SVR–ANN combined model based on ensemble EMD for rainfall prediction,” Applied Soft Computing, vol. 73, pp. 874–883, 2018, doi: https://doi.org/10.1016/j.asoc.2018.09.018.
[7] N. Khalili, S. R. Khodashenas, K. Davary, M. M. Baygi, and F. Karimaldini, “Prediction of rainfall using artificial neural networks for synoptic station of Mashhad: a case study,” Arabian Journal of Geosciences, vol. 9, no. 13, p. 624, 2016, doi: 10.1007/s12517-016-2633-1.
[8] A. Stanley Raj, D. Hudson Oliver, Y. Srinivas, and J. Viswanath, “Wavelet based analysis on rainfall and water table depth forecasting using Neural Networks in Kanyakumari district, Tamil Nadu, India,” Groundwater for Sustainable Development, vol. 5, pp. 178–186, 2017, doi: https://doi.org/10.1016/j.gsd.2017.06.009.
[9] H. D. Purnomo, K. D. Hartomo, and S. Y. J. Prasetyo, “Artificial Neural Network for Monthly Rainfall Rate Prediction,”
IOP Conference Series: Materials Science and Engineering, vol. 180, p. 12057, Mar. 2017, doi: 10.1088/1757- 899x/180/1/012057.
[10] L. C. P. Velasco, R. P. Serquiña, M. S. A. Abdul Zamad, B. F. Juanico, and J. C. Lomocso, “Week-ahead Rainfall Forecasting Using Multilayer Perceptron Neural Network,” Procedia Computer Science, vol. 161, pp. 386–397, 2019, doi: https://doi.org/10.1016/j.procs.2019.11.137.
[11] N. Mishra, H. Soni, S. Sharma, and A. Upadhyay, “Development and Analysis of Artificial Neural Network Models for Rainfall Prediction by Using Time-Series Data,” International Journal of Intelligent Systems and Applications, vol. 10, pp. 16–23, Jul. 2018, doi: 10.5815/ijisa.2018.01.03.
[12] N. Chutsagulprom, K. Chaisee, B. Wongsaijai, P. Inkeaw, and C. Oonariya, “Spatial interpolation methods for estimating monthly rainfall distribution in Thailand,” Theoretical and Applied Climatology, vol. 148, no. 1, pp. 317–328, 2022, doi:
10.1007/s00704-022-03927-7.
[13] Z. A. N. D. X. X. A. N. D. Z. L. Li Yang AND Baorong, “Application of a semivariogram based on a deep neural network to Ordinary Kriging interpolation of elevation data,” PLOS ONE, vol. 17, no. 4, pp. 1–12, Jul. 2022, doi:
10.1371/journal.pone.0266942.
[14] S. Adhikary, N. Muttil, and A. Yilmaz, “Ordinary kriging and genetic programming for spatial estimation of rainfall in the Middle Yarra River catchment, Australia,” Hydrology Research, vol. 47, Jul. 2016, doi: 10.2166/nh.2016.196.
[15] A. Chakrabarty, S. Mannan, and T. Cagin, “Chapter 8 - Inherently Safer Design,” in Multiscale Modeling for Process Safety Applications, A. Chakrabarty, S. Mannan, and T. Cagin, Eds. Boston: Butterworth-Heinemann, 2016, pp. 339–396.
doi: https://doi.org/10.1016/B978-0-12-396975-0.00008-5.
[16] A. Malekian and N. Chitsaz, “Concepts, procedures, and applications of artificial neural network models in streamflow forecasting,” in Advances in Streamflow Forecasting: From Traditional to Modern Approaches, 2021, pp. 115–147. doi:
10.1016/B978-0-12-820673-7.00003-2.
[17] A. Kulkarni, D. Chong, and F. A. Batarseh, “5 - Foundations of data imbalance and solutions for a data democracy,” in Data Democracy, F. A. Batarseh and R. Yang, Eds. Academic Press, 2020, pp. 83–106. doi: https://doi.org/10.1016/B978- 0-12-818366-3.00005-8.
[18] G. Rozalia, H. Yasin, and D. Ispriyanti, “PENERAPAN METODE ORDINARY KRIGING PADA PENDUGAAN KADAR NO 2 DI UDARA (Studi Kasus: Pencemaran Udara di Kota Semarang),” JURNAL GAUSSIAN, vol. 5, no. 1, pp.
113–121, 2016, [Online]. Available: http://ejournal-s1.undip.ac.id/index.php/gaussian
[19] C. C. Nwokike, B. C. Offorha, M. Obubu, C. B. Ugoala, and H. I. Ukomah, “Comparing SANN and SARIMA for forecasting frequency of monthly rainfall in Umuahia,” Sci Afr, vol. 10, p. e00621, 2020, doi:
https://doi.org/10.1016/j.sciaf.2020.e00621.
[20] G. Xia et al., “Feature selection, artificial neural network prediction and experimental testing for predicting breakage rate of maize kernels based on mechanical properties,” Journal of Food Process Engineering, vol. 44, no. 2, p. e13621, 2021, doi: https://doi.org/10.1111/jfpe.13621.