PDF eres.library.adelaide.edu.au

Determining the optimal level of complexity required to model a given problem is one of the most difficult tasks in developing an ANN. The state-of-the-art approach taken in this study is summarized in Section 3.3, along with the model development steps identified as requiring further analysis. Furthermore, the development of a physically based model requires that all physical processes take place within it.

Figure 2.1 Layer structure of a multi-layer perceptron.

Training

Advantages

Although there may be some advantages to using simple linear models in terms of implementation and interpretation, these models are limited by the fact that they are unable to capture the complex nonlinear nature of hydrological problems (Zhang et al., 1998). ANNs therefore have an advantage over both conceptual and Box-Jenkins time series models, which lack this generalization ability (Zealand et al., 1999). In fact, the non-linear and flexible functional form of ANNs makes it possible to model any continuous function to an arbitrary degree of accuracy (Zhang et al., 1998), and as such ANNs are often considered ' universal function approximators' (Cheng) and Titterington, 1994; Bishop, 1995; ASCE Task Committee, 2000b).

Issues and Limitations

Furthermore, the linear approximation requires the calculation of the Hessian matrix (matrix of the second derivative of the error function with respect to the weights). Their results showed that the accuracy of the classical methods and the bootstrapping methods were about equal. Furthermore, calculation of the Hessian matrix required for the linearization can be difficult due to near singularities.

B AYESIAN M ETHODS

Background to Bayesian Methodology
Use of Bayesian Methods in Water Resources Modelling
Bayesian Neural Networks
Training and Prediction
Model Selection
Limitations of Current Bayesian Neural Network Practices

Approximate Bayesian methods may also be inappropriate due to the Gaussian approximation of the posterior weight distribution used. Therefore, before inferences can be made about the back end of the weightp(w|y), choices must be made about the probability functionp(y|w) and the prior weight distribution p(w). However, the complexity of the weights' probabilities prevents Gibbs from sampling the weights (Titterington, 2004).

Figure 2.3 Evidence incorporating Occam’s razor.

C ONCLUSION

Most available ANN software does not allow for Bayesian analysis, and due to the difficulties associated with programming the complex techniques available, Bayesian ANN training has not been adopted by water operators. However, the difficulty in implementing these methods and the lack of them by water resource modelers indicates the need for a Bayesian ANN development framework that provides accurate results while being relatively simple to code and implement. Therefore, the objective of this research is to develop a relatively simple Bayesian framework that can be applied to ANNs in the field of water resource modeling, where the primary goal of the procedure is not statistical optimality, nor optimal efficiency, but good results and ease of programming and use.

State-of-the-Art Deterministic ANN Methodology

I NTRODUCTION

Choice of Performance Criteria .1 Review of current practice
Choice of Data Sets
Data Pre-Processing
Determination of ANN Inputs .1 Review of current practice
Determination of ANN Architecture .1 Review of current practice
ANN Training
ANN Validation

MAE uses absolute values of the residuals and is therefore less weighted in relation to fitting extremes in the data. Nevertheless, r2 continues to be one of the most commonly used criteria for evaluating the performance of an ANN. Alternatively, performance measures can be considered that take into account the parsimony of the model.

BIC =−2 logL(w) +ˆ dlogN (3.8) where logL(w)ˆ is the log maximized likelihood function and d is the dimension of the weight vector (i.e. the number of weights in the model). AIC and BIC will also be used to evaluate the generalizability of ANN models based only on the results of the training data. In the approach proposed by Bowden (2003), considering only the predictive accuracy, the physical reliability of the model is ignored.

Traditionally, data have been arbitrarily partitioned without considering the statistical properties of the respective data sets (Maier and Dandy, 2000a). 3.9) where K is the number of inputs, and μandσ are the mean and standard deviation of the input or output variable, respectively. Furthermore, in the current research, analytical measures will be used to help select the optimal SOM network size.

To overcome this, Shi (2000) proposed a distribution transformation method to transform the input data into uniform distributions on the range [0,1] using the cumulative distribution functions (CDF) of the input variables. It is more important for the distribution of the data to be approximately symmetrical and not have a heavy tail than to be normally distributed (Masters, 1993). A further limitation of EBMLP is that the results depend on many user-defined parameters.

S UMMARY OF A PPROACH A DOPTED AND F URTHER

Knowing this fact does not provide great confidence in ANN predictions; however, if it can be demonstrated that a trained ANN has captured knowledge of the underlying system, the model can be applied with greater confidence. Therefore, in this research, validation of the ANNs developed will be performed by evaluating out-of-sample performance on an independent set of validation data and by assessing the relationship modeled by the relative contributions of the model inputs in the prediction using the output. The relative contributions of the inputs as modeled by the ANN can then be compared with expert or a priori knowledge of the system, correlation or mutual information measures, or other data mining methods when there is little or no a priori knowledge of the system.

If the assumptions are not met (eg model residuals are significantly non-Gaussian), non-linear transformations including the logarithm, inverse and square root of the data will be considered when attempting to improve the model. As there is currently no generally accepted method for assessing generalizability in order to choose the optimal size of ANN, this issue will be further explored in this research. ANN Training: This research will compare the optimization performance of back-propagation algorithms, GA and SCE-UA.

Choice of performance criteria and ANN validation: The commonly used performance criteria RMSE, MAE, r2 and CE will all be used to evaluate the performance of the trained ANNs developed in this research. The AIC and BIC will also be used to evaluate the generalizability of the ANN models based only on the training data results, while considering the complexity of the models. To validate the physical credibility of the models developed, the relationship modeled by the ANNs will be assessed by evaluating the relative contributions of the model inputs in predicting the output.

However, because there is currently no widely accepted method to quantify the relative importance of the ANN inputs, this study will examine a number of input importance measures to determine which measure, if any, is most appropriate for assessing the modeled relationship . by an AN.

F URTHER I NVESTIGATION OF D ETERMINISTIC ANN D EVELOPMENT M ETHODS USING S YNTHETIC D ATA

Synthetic Data Sets
ANN Training - Comparison of Training Algorithms

The signal-to-noise ratio of the "measured" data set was λ= 3.39, indicating that the noise levels in the data are moderate (i.e. the strength of the signal is greater than three times the strength of the noise). Since the strength of the signal is almost 24 times the strength of the noise, the data is considered to contain little noise. Again, the distribution of the data is approximately symmetric and did not require a non-linear transformation.

If the epoch size is set equal to the size of the training set (that is, the entire training set is processed between weight updates), the network would operate in batch mode. For incremental training, which was used in this study, the learning rate must be slowly reduced during training to ensure convergence of the BP algorithm (Sarle, 2002). The size of the mating pool should be the same as the initial population; therefore, each chromosome participates twice in a (random) tournament.

Conversely, the second offspring chromosome is produced using the original genes up to the crossover point, averaging the parent genes from the crossover point to the end of the chromosome, as shown in Figure 3.14. The parameterτ should be chosen according to the size of the optimization variables and the sensitivity of the objective function to these variables. In periodic stages of evolution, the entire population is mixed before points are reassigned to complexes (i.e. communities are mixed and new communities are formed).

A schematic of the SCE-UA algorithm is shown in Figure 3.15, which describes the main steps performed during the algorithm.

Figure 3.9 Probability density of response variable y t for data set I.

STEP 3

Determination of ANN Architecture - Assessment of Model Selection Criteria
ANN Validation - Assessment of Input Importance Measures
Results
Evaluation of Best Models
Conclusions

This method is based on the sum of the products of input-hidden and hidden output. For dataset II, the true variance σy2 and MAE of the noise of the training data were 0.983 and 0.797, respectively. It can be seen that the estimated values of σˆy2 and MAE for the models containing 1 hidden node came very close to the actual values, indicating good generalizability of the models.

In this table you can see that while similar results were obtained using each of the input importance measures, the overall best results were obtained using the modified connection weight approach. Moreover, it can be seen that the overall accuracy of both modified input importance measures was higher than that of the original methods. The performance of the best models developed for modeling each synthetic dataset was evaluated against the 'measured' and 'real' training, test and validation data.

Nevertheless, the results presented in Table 3.26 show that the model has good generalizability in each of the three subsets. The RI values of ANN 1 hidden node inputs estimated by the modified link weight method are presented in Table 3.27 compared to the corresponding RI estimates based on PMI. Figure 3.26 shows a plot of the time series of the model's predictions against "measured" and "true" data for.

Scatterplots of the 5-hidden-node ANN model predictions against the “measured” and “true” data are shown in Figure 3.27, with the corresponding model performance results shown in Table 3.30. Scatter plots of the 6-hidden-node ANN model predictions against the “measured” and “true” data are shown in Figure 3.28, with the corresponding model performance results shown in Table 3.31. The estimated RI values for the 5th and 6th ANN inputs of the hidden nodes are given in Table 3.32 compared to the PMI-based estimates.