• Tidak ada hasil yang ditemukan

ARTIFICIAL NEURAL NETWORK MODEL

CHAPTER 5 MODELING METHODOLOGY

5.3 ARTIFICIAL NEURAL NETWORK MODEL

In this study, a three- layer feed forward artificial neural network (ANN) model having four input neurons, one output neuron and seven hidden neurons was used as shown in Figure 4.8. Dissolved oxygen concentration in water bodies is influenced by a number of physical, chemical and biological factors. An attempt has been made to obtain a relationship connecting biochemical oxygen demand, total phosphorus, conductivity and alkalinity with dissolved oxygen concentration.

5.3.1 DATA NORMALIZATION

The input and output quantities were first normalized to a range of 0.1 to 0.9 with the following equations:: :

nPi= 0.1+ 0.8 *(Pi – Pmin i)/(Pmax i - Pmin i ) nt = 0.1+ 0.8 *(ti – tmin i)/(tmax - tmin )

where nPi is the normalized value, Pmax i and Pmin i are the maximum and minimum values of the ith node in the input layer and nt, tmax , tmin arethe corresponding values in the output layer, respectively for all the feed data vectors. When the training was completed the simulation results were de-normalized by reversing the action.

5.3.2 TRAINING ALGORITHM USED

Tangent sigmoid function was selected between the input layer and the output layer and logarithmic sigmoid function was used between the hidden layer and the output layer. Levenberg- marquardt was the training method selected as it was reported to have fastest convergence for medium sized neural networks (Karul et.al, 2000., Math works, 2000)).)..

The Levenberg–Marquardt algorithm is an iterative technique that locates a local minimum of a multivariate function that is expressed as a sum of squares of several non- linear real valued functions. It interpolates between the Gauss-newton algorithm (GNA) and the method of gradient descent (Lourakis and Argyros,2005). It uses an approximation to the Hessian matrix in the following Newton-like weight update

xk+1 = xk – [JTJ +µ I]-1JT e.

Where xk is the weights of neural network, J is the Jacobian matrix that contains network errors with respect to the weights and biases, µ is a scalar that controls the learning process and e the residual error vector. When the scalar µ is zero, the above equation is just the Newton’s method, using the approximate Hessian matrix. When µ is large, the equation becomes gradient descent with a small step size. Newton’s method is faster and more accurate near an error minimum, so the aim is to shift towards Newton’s method as quickly as possible (Math works, 2000, Daliakopoulos et al, 2005). Thus, the value of µ is decreased after each successful step (reduction in performance function) and is increased only when a tentative step would increase the performance function. In this way, the performance function will always be reduced at each iteration of the algorithm.

The Neural Network Toolbox 6 and MATLAB release 12 (The math works Inc USA, 2000) were used for the development of the neural network models. A MATLAB code was written, which loaded the data file, trained, tested and validated the networks and the weights and bias matrices, and performances were saved in separate text files so

that they can be used in Microsoft Excel as well as for further studies like sensitivity analysis and optimization.

5.3.3 GENERALIZATION

Out of the available data, 75 data sets were used for training, 15 for validation and 15 for testing. Performance of the neural network model was evaluated with the root mean square error (RMSE), coefficient of determination (R2), and Nash efficiency (E) between the modeled output and measures of training and validation and testing data sets.

The best fitting ANN model should give good prediction fortraining data, validation data as well as for testing data. A well-trained network should give the output with least error for training as well as validation data. Once that is done, its performance for a third set of data (testing data), which is not included in the training data or the validation data should be checked. If testing error also appears to be in the same manner as the training and validation data, it can be taken as the best-fit model (Engin et.al, 2005). An over fitted neural network typically imitates the data in the training set very successfully but generates a bad estimation for the data not included in the training. For a good generalization, over fitting should be prevented either by early stopping that involves stopping the training, when the errors for the validation set begin to rise or by regularization, which involves the modification of the performance function to assign only a minimum number of hidden layer neurons that is just sufficient for learning the system. (Karul et al, 2000., Moatar et al, 1999., Burien et al., 2001). The second method is used in this study to avoid over fitting. As there is no universal formula to fix the required number of hidden neurons, a trial and error approach was applied to select the best ANN architecture. The trial started with 4 hidden neurons and more neurons were gradually added during learning, until the optimal result was achieved in the validation data. This way, the neural network architecture with 7 hidden neurons and with tangent sigmoidal function (tansig) in the input layer and logarithmic sigmoidal function (logsig) in the output layer was finalized .the neural network topology used for dissolved oxygen prediction is shown in Figure 5.7.

Tansig(n) calculates the output according to : nh = 2/(1+exp(-2*n))-1,

Where nh is the output from the hidden layer,

and n = W1*P + b1

where W1 is the weight matrix connecting input and the hidden layer, P is the input matrix

b1 is the bias .

Final output can be expressed in the form:

tm = logsig (W2*nh+b2) = 1/(1+exp(-(W2*(2/(1+exp(-2* (W1*P + b1 )))-1)+b2)), Where

W2 and b2 are the weight and bias matrices connecting the hidden layer and the output layer.

Figure 5.7 Neural network topology used in the study

Input layer Hidden layer Output layer

BOD

TP

CONDUCTIVITY

ALKALINITY

DO