Metaheuristic Approach - Data Fusion

G- ANFIS S-ANFIS

2.5 Data Fusion

2.5.5 Metaheuristic Approach

Table 2.5: Data Fusion Methods and the Models Involved in Remote Sensing-Based ET0 Estimations

Data Fusion Methods

Models Data

Sources

Reference

STARFM ALEXI

DisALEXI

Landsat MODIS

(Cammalleri, et al., 2013; Cammalleri, et al., 2014; Knipper, et al., 2018; Semmens, et al., 2016)

STARFM SEBS ASTER

MODIS

(Li, et al., 2017) Simple Taylor

Skill

RS-PM Shuttleworth- Wallace PT-JPL Modified PT SIM

Landsat (Yao, et al., 2017)

Ensemble Kalman Filter

Distribution Time Variant Gain Model

MODIS (Zou, et al., 2017)

ESTARFM SEBS

PM model

Landsat MODIS

(Ma, et al., 2018)

* ALEXI: Atmospheric – Land Exchange Inverse, DisALEXI: Disaggregated Atmospheric-Land Exchange Inverse, SEBS: Surface Energy Balance, RS-PM:

Remote Sensing Penman-Monteith, PT-JPL: Priestley-Taylor-Jet Propulsion Laboratory, SIM: Simple Hybrid ET, STARFM: Spatial and Temporal Adaptive Reflectance Fusion Model, ESTARFM: Enhanced STARFM

algorithms (such as the back-propagation) only involve the adjustment of weights and biases which are case-specific. The fine-tuning process can be time- consuming, especially when the search space (possible range of solution) is huge. Taking the ANN as an example, the numbers of neurons in the hidden layers range from one to infinity, not to mention that a similar case could also happen when determining the number of hidden layers. Grid search and trial- and-error methods appear to be impractical when dealing with extremely large search space. Therefore, in recent years, the rapid diversification of the metaheuristic approaches has been observed. In essence, the metaheuristic approach refers to the formulation of objective function(s) to represent the problems of interest. Subsequently, a set of algorithm is used to minimise or maximise the objective function(s) in order to obtain an optimum solution with respect to the constraints and conditions provided (Yang, Bekdaş and Nigdeli, 2016).

The metaheuristic approach can be classified into several categories:

evolutionary computing, swarm intelligence and iterative-based algorithms (Goh and Lee, 2019). Evolutionary computing involves the design of a symbolic regression to fit the input dataset through techniques such as genetic programming and genetic algorithms. According to Hebbalaguppae Krishnashetty, et al. (2021), genetic programming supports simultaneous searching to perform fine-tuning and represent the problems as a linear program.

The evolution involves the detection of feature importance (analogous to fittest

genes) in the dataset and the cross-mutation that occurs in each iteration. Jing, et al. (2019) performed an extensive review on the implementation of evolutionary computing in ET-related research studies. They discovered that the lack of variability has caused evolutionary computing to lag other metaheuristic approaches, such as swarm intelligence and iterative-based algorithms.

The swarm intelligence, on the other hand, approaches the optimisation problems based on the perception that the intelligence can be obtained via the interactions among the individuals within a population or swarm (Janga Reddy and Nagesh Kumar, 2020). This is the imitation of the civilisation of the human society, where the development and growth become more rapid as the interactions among individuals (exchange of information) increase. Since the development of swarm intelligence is generally nature-inspired, hence many of the swarm-based optimisation algorithms attempt to imitate the behaviour of different organisms, such as the PSO, ant colony optimisation (ACO), artificial bee colony (ABC) and so on (Dorigo and Blum, 2005; Karaboga and Basturk, 2007; Kennedy and Eberhart, 1995). In recent years, the integration of swarm- based optimisation algorithms in machine learning for ET estimating applications has been increasing drastically. This, on the one hand, is due to the efficiency of the swarm intelligence in tackling various optimisation problems (Pham, et al., 2021), at the same time, the variation and diversity of swarm intelligence are still increasing on a year-on-year basis. Many new swarm-based optimisation algorithms had been introduced in the past few years (Rostami, et al., 2021), not to mention the improvements and modifications to existing

method among the researchers, Tang, Liu and Pan (2021) still reported the limitations of swarm intelligence in their publication. It was claimed that due to the deployment of high number of search agents in a large search space, the computational cost of swarm intelligence can be incredibly high, which would be unnecessary for simple problems. Besides, old algorithms are more likely to converge prematurely. This issue is being addressed by many developers of swarm-based optimisation algorithms during the recent extension of the swarm intelligence class (Mirjalili and Lewis, 2016).

Unlike the evolutionary computing and swarm intelligence, studies conducted on iterative-based algorithms are lesser. This could be due to the recent trend that focusses on the studies related to swarm intelligence.

Essentially, the iterative-based optimisation works on the improvement of the objective function(s) via the neighbourhood search technique (Şen, Dönmez and Yıldırım, 2020). Some examples of the iterative-based algorithm include the simple annealing (SA) and black hole algorithm (Hatamlou, 2013; Kirkpatrick, Gelatt and Vecchi, 1983). Iterative-based algorithm for machine learning application, particularly those related to ET estimation is still very limited and could potentially be a promising hybridisation technique in the near future.

The use of metaheuristic approaches involves the improvement of the objective function(s), which is the mathematical representation of the problems of interest. Over the years, the mean square error (MSE) has been used

extensively as the objective function for machine learning-related applications, in which the hyper-parameters of machine learning models shall be tuned to minimise the error of predictions. Nevertheless, modern metaheuristic approaches can handle more than one objective function concurrently, resulting in the so-called multi-objective optimisation. This method has been deployed in many hydrological studies and yielded positive results. For instance, Yadav, Chatterjee and Equeenuddin (2021) used the genetic algorithm to optimise the ANN for suspended sediment yield modelling with the error variance and bias acting as competing for objective functions. The selection of objective functions had a significant impact on the performance of the ANN in which the overfitting and bias could be resolved. The successful adoption of multi-objective optimisation using metaheuristic approaches in hydrology implies that similar findings can also be replicated for ET-related problems. This can be one of the research directions for future studies.

The metaheuristics approach (mostly single-objective optimisation) has been applied for ET modelling in the past few years. The optimisation algorithms, regardless of the class were used to tune the hyper-parameters of the machine learning models. Some of the important findings have been summarised in Table 2.3 of Section 2.3.

Chapter 2 provides a detailed discussion on ET and its observation/estimation methods, application of machine learning models in estimating ET0, data requirement of machine learning models for ET0

estimations as well as study of data fusion or ensemble models to compute ET0. Individual empirical and machine learning models for ET0 estimation were studied and reported based on previous research works. Furthermore, a comprehensive analysis on the types of datasets is presented. A case study was performed in arid and semi-arid regions to identify the priority ranking of input meteorological variables. Apparently, it was established that solar radiation and temperature emerged as the two most important factors for the accurate and precise ET0 estimation. Numerous ways for integrating data fusion techniques on base machine learning models were also studied and discussed.

Careful identification of the research gap or rationale is crucial to continue this research work. In this context, several gaps were deemed to be filled. Firstly, past studies have shown that areas with different climate patterns could have different priority rankings of meteorological variables. However, a comprehensive study in equatorial climate has been lacking. Malaysia, which largely depends on agricultural production becomes an interesting area of study due to its equatorial Monsoon climate. Therefore, it is imperative to determine the most influential meteorological factor on ET0 in such a region as an effort

to cut down the number of meteorological variables that have to be monitored for ET0 estimation.

Machine learning models’ accuracy in ET0 estimation would be impaired in the case of limited input meteorological variables, which is undesirable. Therefore, the second research gap identified is the lack of a robust machine learning model that is resilient towards the reduction of input meteorological variables. In view of the situation, data fusion or ensemble model is proposed to solve the encountered problem by resampling or combining effects of multiple models. In fact, data fusion has not been widely used to assemble machine learning models with ground observation data. This gap could be filled with this study where different techniques of ensemble models are compared to enhance the performance of base machine learning models.

Thirdly, the literature review reveals that machine learning models need to be trained locally for local use. This means that a spatially robust model for ET0 estimation is still absent. Hence, a model with broad spatial applicability besides having lower data requirements would be advantageous. This means that data need not to be collected for a long period before proceeding with the modelling work. The outlined research gaps align well with the problem statement mentioned in Section 1.2, in which the qualitative and quantitative hungers of the machine learning models need to be addressed and resolved.

METHODOLOGY

Dalam dokumen ROBUST DATA FUSION TECHNIQUES INTEGRATED MACHINE LEARNING MODELS FOR ESTIMATING (Halaman 85-92)