Bayesian Modelling Approaches

G- ANFIS S-ANFIS

2.5 Data Fusion

2.5.2 Bayesian Modelling Approaches

Model ensembles that utilise Bayes rules are known as Bayesian modelling. According to Höge, Guthke and Nowak (2019), Bayesian modelling can be classified into two main approaches, namely the “winner-takes-all”, where the selection of the most relevant model is made, and “team-of-rivals”

where the model averaging is done. The selection should be based on the goal and the nature of the problems. For instance, if one is convinced that a true model exists to explain a problem, he/she should consider Bayesian model selection (BMS) to identify the model, provided the true model is present among

problem with different models. Should he/she does not want to miss out on any of the hypotheses, he/she shall opt for the Bayesian model averaging (BMA) to combine the characteristics or favourable traits of each model.

In the BMS, the “true” model is selected based on the degree of probability that it is true. The value of the probability is updated via Bayesian model evidence. However, the algorithm can be indecisive when two models have similar performances. Bayesian model evidence can be presented in different forms such as model weight ratios and Bayes factor (Kass and Raftery, 1995). Application of the Bayesian model evidence can be difficult when the problem is highly non-linear, requires complex computation or involves high dimensionality (Höge, Guthke and Nowak, 2019).

As for the BMA, the weight factors of the individual model are still maintained. Practically, the BMA is the intermediate phase of the BMS. In the case that one cannot identify the true model, BMA is used to estimate the final output based on the weightage of each model. It should be noted that both the BMA and the BMS aim to search for the final true model, however the approaches are different due to the limitation of data size and set of models (Höge, Guthke and Nowak, 2019). From these two main principles, the Bayesian modelling approach can be highly branched into many distinct algorithms. For instance, the Bayesian joint probability (Zhao, Wang and

Schepen, 2019) and the Bayesian regression (Khoshravesh, Sefidkouhi and Valipour, 2015) and been used on different occasions for ET0 estimation.

Bayesian modelling approaches have long been applied in fine-tuning the performance of machine learning models. Chen, et al. (2015) performed the BMA on the machine learning models and the conventional empirical models.

The ensemble was done using two distinct strategies: (i) ensemble of all models and (ii) ensemble of the best models. The results showed that the best model ensemble produced ET0 estimation of the highest accuracy at both the regional and global scales. The findings of this study also proved that the accuracy of the ensemble was contributed by its constituent models. The ensemble of the best models would result in better performance, whereas when some of the poor models were included, the accuracy deteriorated.

Bayesian regression was also used to estimate the ET0 (Khoshravesh, Sefidkouhi and Valipour, 2015). In the study, Bayesian regression was compared alongside with a multivariable fractional polynomial model and robust regression. When the temperature and radiation data were the only meteorological variables fed into the model, the multivariable fractional polynomial model outperformed the Bayesian regression, but the difference was insignificant. Therefore, Bayesian regression still has the potential to be fine- tuned when other meteorological variables are used.

Separately, Zhao, Wang and Schepen (2019) used Bayesian joint probability as their approach to forecast the ET0 based on the Australian

raw forecasting by the global climate model could produce highly biased results.

Hence, the authors attempted to integrate the Yeo-Johnson transformation, bi- variate normal distribution and Schaake Shuffle to enhance the model’s forecasting ability. The combination of the aforementioned methods represented a form of Bayesian joint probability. The ensemble model was able to forecast ET0 up to two weeks in advance with satisfactory precision and accuracy.

Despite the fact that the Bayesian model approaches had been utilised several times for the ET0 prediction and estimation, however, the reports on the details of the ensemble were very limited. For the case of the BMA, the weights of the individual model present in the ensemble remained unknown until He, et al. (2020) presented the results systematically using the Bayesian three-cornered hat method. In the study, the authors adopted different datasets retrieved from remote sensing satellites and land surface models. The influence of the individual datasets on the resultant ET for different seasons and land covers, varied accordingly. It was found that Bayesian-based ensemble could produce better estimation of ET than the simple averaging. The influence of other models on the ET0 estimation is still not clearly understood. Further investigations shall be carried out following this direction to produce contributing discoveries to the scientific community.

57 2.5.3 Boosting Algorithm

Boosting is a technique in which prediction accuracy is enhanced by averaging outputs of a few weak learners (Hassan, et al., 2017). Unlike the BMA, the boosting algorithm works stepwise, whereby learners are added one at a time to minimise the cost function. The first learner searches for a solution with optimum loss. Then, the following learners will be included into the ensemble and the residuals of their predecessors are reduced. Numerous variants of boosting algorithms had been proposed, each with their novel characteristics.

The most well-known boosting methods include the gradient boosting (Friedman, 2001), adaptive boosting (Freund and Schapire, 1997), XGBoost (Chen and Guestrin, 2016) and CatBoost (Prokhorenkova, et al., 2018).

The application of the boosting algorithm in ET0 estimation can be found in several studies. Fan, et al. (2018) applied gradient boosting to the decision tree to estimate ET0. At the same time, the authors also compared the XGBoost and GBDT with the SVM, ELM, M5Tree and RF. Two sets of input meteorological variables, namely (i) the complete set (Tmax, Tmin, u, RH and Rs) and (ii) the temperature and radiation-based input set were partitioned to perform the ET0 estimation. All the models exhibited similar performance. The author opined that the SVM and ELM provided better accuracy and stability whereas the tree-based models offered greater computational efficiency, particularly the XGBoost which showed comparable accuracy with the SVM and ELM.

SVM (Huang, et al., 2019). The advantage of CatBoost over the RF was that instead of randomly generating a set of predictors, CatBoost generates one predictor after another, taking into account the error of the previous tree. The authors concluded that although the CatBoost did not show significant improvement in terms of accuracy and stability as compared to the SVM, the CatBoost was still highly recommended to minimise computational cost and time.

Based on the findings from the literature, it can be inferred that boosting algorithm could not stand alone as a model itself. This is because it cannot produce estimation with higher accuracy and precision. However, due to its nature that performs a greedy search, it can help to improve integration efficiency and overcome the problem of overfitting. These advantages of boosting algorithm should be considered when constructing a hybrid model.

Dalam dokumen ROBUST DATA FUSION TECHNIQUES INTEGRATED MACHINE LEARNING MODELS FOR ESTIMATING (Halaman 77-82)