2.3 Machine Learning for Evapotranspiration
2.3.2 Support Vector Machine
31
As shown in Table 2.1, a considerable number of studies had been done on the utilisation of the ANN as an ET0 estimation tool. However, the trend of the studies generally focused the following few aspects:
• Minimisation of mandatory inputs
• Generalisation of the ANN for wider applications
• Application of novel input features
• Improvisation of ANN’s ability to forecast
It is believed that the four aspects stated above could revolutionise the prediction of ET0, with a more general model without the need for much climate data. On top of that, a longer forecasting horizon acts as an important pre- requisite for a pro-active water management strategy. Unfortunately, using ANN alone seems to be insufficient in providing the solution. Hence, the coming subsections will be focussed on the discussion of other machine learning models that were employed in ET0 estimation.
32
and accuracy can be optimised at the same time.
Since the ET0 estimation is a regression problem, the support vector regression (SVR) is typically used. In the working mechanism of an SVR, a loss (or cost) function is used to define the allowable deviation and the function to estimate the targeted output (Raghavendra and Deka, 2014). The working principle of SVM is shown in Figure 2.2.
Figure 2.2: Working Principle of the SVM (Shrestha and Shukla, 2015)
The SVM had been popular in hydrology applications, including the ET0
estimations (Raghavendra and Deka, 2014). The SVM has high robustness, capable of solving complex problems, less vulnerable to overfitting and could describe the model in a more compact manner (Zendehboudi, Baseer and Saidur, 2018). The network structure of the SVM is illustrated in Figure 2.3.
33
Figure 2.3: Network Structure of the SVM
Application of the SVM in ET0 estimation has long become a practice in the field. This was encouraged by the ability of the SVM to learn complex relationship between input features and ET0, followed by the deduction of accurate predictions. Table 2.2 summarises some research works using the SVM in ET0 study.
Input Vectors X
K(X,X1) K(X,X2)
K(X,X3) K(X,X4)
Output Y
Weights
34
Approach Key Findings Reference
SVM
Adaptive Neuro-Fuzzy Inference System (ANFIS)
MLR
Multiple Non-Linear Regression (MNLR)
• Mean monthly Tmax, Tmin, Tmean, RH, u and Rs were used to train SVM and ANFIS in cold mountainous areas based on PM model
• Different combinations of input meteorological variables were tested
• Optimum member function and kernel function were determined for ANFIS and SVM, respectively
• It was found that for SVM, the best kernel function was RBF
• SVM and ANFIS performed better than MLR, MNLR and other empirical models
(Tabari, et al., 2012)
LS-SVM MLP
• Daily Tmean, Rs, u and RH were used to train the models using PM model as reference at warm temperate region
• Different combinations of meteorological variables were tested
• When all inputs were available, least square SVM had the best performance
• In the case where u and RH data were lacking, HS and PT models performed better than machine learning models
(Kisi, 2012)
35
Table 2.2 (continued): Summary of Research using the SVM in ET0 Study
Approach Key Findings Reference
SVM BPNN
Genetic Programming
• Monthly Tmax, Tmin, u, ea and sunshine hours were used to train the models based on PM model at moderately continental area
• Wavelet transformation and firefly algorithm were combined with SVM for the purpose of anticipating future ET0
• Both SVMs performed better than BPNN and genetic programming, with the SVM using discrete wavelet transformed data achieving the highest estimation accuracy
(Gocić, et al., 2015)
SVM
Multivariate Adaptive Regression Splines (MARS)
Gene Expression Programming (GEP)
• Monthly data of Tmax, Tmin, Tmean, RH, u and Rs were used to train the models based on PM model at arid and semi-arid regions
• Different combinations of meteorological variables (based on data types) were tested
• The study revealed that irrespective of the data types, MARS would have the best performance, followed by RBF-based SVM, GEP and polynomial-based SVM
(Mehdizadeh, Behmanesh and Khalili, 2017)
LS- SVM • Monthly Tmax and Tmin were used to train the model using HS model as reference at subtropical climate area
• The objective of the study was to obtain information on evapotranspiration in the future generation
(Kundu, Khare and Mondal, 2017)
36
Approach Key Findings Reference
SVM
Categorical Boosting (CatBoost) RF
• Daily Tmax, Tmin, u, Rs and RH data were used to train the models using PM model as reference at subtropical monsoon region
• Different combinations of inputs were tested
• RF was overfitted easily
• SVM had the best performance
• However, CatBoost consumes less computational cost as compared to SVM
(Huang, et al., 2019)
SVM MLP MLR
• The machine learning models were trained based on the ET0 value calculated from the PM model
• MLP had the best performance when the temperature, radiation and humidity data were fed to the model
• However, when the humidity data was absent, the error rate of the MLP doubled
• SVM was more robust towards the reduction of input meteorological variables
(Kaya, et al., 2021)
37
Based on the literature review, it can be inferred that the SVM is a reliable model for estimating ET0. However, the performance of the SVM is strongly correlated to the kernel function as well as the quality of data inputs (Raghavendra and Deka, 2014), although some may find that the SVM is more robust towards the reduction of input meteorological variables than the ANN- based models. Since the construction of the model is purely dependent on the data provided, extrapolation of the SVM is likely to produce poor results. It means that the SVM has to be calibrated or trained using local data to ensure its reliability and relevance. On top of that, the SVM can be computationally expensive when modelling problems with higher non-linearity as well as dimensions, thus consuming a significant amount of time. Hence, there is still a wide option of research direction involving the optimisation of SVM in ET0
estimation.