BACKGROUND ll"l-

(1)

Nguyen Hftu Cdng vd Dig Tap chi KHOA HQC & CONG NGH$ 93(05): 53 - 59 A STUDY TO IMPROVE A LEARNING ALGORITHM

OF NEURAL NETWORKS

Cong Huu Nguyen*', Thanh Nga Thi Nguyen\ Ngoc Van Dong^

'Thai Nguyen University. ^College of Technology - TNU, 'Ha Noi Vocational College ofeleclncal mechanical ABSTRACT

Since the last mid- twentieth century, the study of optimization algorithms, especially on the development of digital computers, is Increasingly becoming an important branch of mathematics. Nowadays, those mathematical tools are practically applied to neural networks training. In the process of finding an optimal algorithm to minimize the convergence time of the solution or avoiding the weak minima, local minima, the problems are starting to study the characteristics of the error surface. For the complex error surface as cleft-error surface, that its contours are stretched, bent forming cleft and cleft shaft, the old algorithms can not be settled.This paper proposes an algorithm to improve the convergence of the solution and the ability to exit from undesired areas on the error surface.

Keywords: neural networks, special error surface, local minima, optimization, algorithms

BACKGROUND

In the process of finding an optimal algorithm to minimize the convergence time of the solution or to avoid tweak m inima, local minima, the problems are starting to study the characteristics of the error surface and take it as a starting point for improvement or propose a new training algorithm. When mentioning about the neural networks, trained network quality is usually offered (supervised leaming). This related quality fiinction and led to the

concept of network quality surface. Sometimes, we also call the

quality surface by other terms: the error jurface, the executing surface. Figure 1 shows in error surface. There are some special hingsto note for this surface such as: the lope is drastically changing on the parameter pace. For this reason, it will be difficult D choose an apprppriate pace for leaming Igorithm known as the steepest descent Igoritm, conjugate gradient... In some areas f the error surface is very flat, allowing large

•arning rate, while other regions of big opes, require a small learning rate.

ther methods such as rules of torque, laptive leaming rate VLBP (Variable naming Rate Back propagation algoritm) are it effective in this problem [5].

'el. 0913 589758, Email: [email protected] v.

Thus, with the complex quality surfaces are more difficult in the process of finding the optimal weights and can still blocked at the shaft of the cleft before reaching the minimum point, if the quality surface is the

cleft form. Probably, having next strategy to solve this problem is that after

reaching near the cleft shaft by gradient method with calculated step approach minimized following line (or with s pecified leaming steps) we will move along the bottom of a narrow cleft through the gradually asymptotic geometry, it is assumed that the geometry is a line or approximately quadratic curve.

The objecfive of this paper is to study and apply the cleft algorithm to calculate the learning step for finding the optimal weights of neural networks to solve the control problem.

CLEFT-OVERSTEP ALGORITHM FOR NEURAL NETWORK TRAINING Cleft-overstep principle

Examining the unconstrained minimizing optimization problem:

J(u) ->• min, UE En (I)

Where u is the minimizing vector in an n- dimensional Euclidean space, J(u) is the target function which satisfies (2) Hm^ W = *

ll"l-

53

(2)

Nguyen HOu Cong vd Dtg Tgp chl KHOA HOC & C 6 N G N G H $ 93(05): 53 - 59 The optimization algorithm for problem (1)

has the iteration equation as follows:

«'"=(/+a^..T',A = O.I,, • (3) where u'' and u''*' are the starting point and the ending point of the kth iteration step, sk is the vecgtor which show the changed direction of numeric variables in n-dimentional space;

«* is the step length.

or* is determined according to (he cleft- overstep principle and called a "cleft- overstep" step and equation (3) is called the cleft-overstep algorithm.

The basic difference between cleft-overstep method and other methods is in the principle for step adjustment. According to this principle, the step length of the searching point at each iteration step is not smaller than the smallest step length at which the target function reaches the (local) minimum value in the moving direction at that iteration step.

The searching opdmization trajectory of the cleft-overstep principle creates a geometric picture in which the searching point

"oversteps" the cleft bottom at each iteration step. To specify the cleft-overstep principle we examine a one numeric variable function at each iteration step [4]:

h[a) = j(u' +a.s^)

w

Suppose that s^ is the direction of the target function at the point uk. According to condition (2), there is a smallest value a* >0 so that h(a) reaches minimum:

or'-argmin/i{ar), a>0 (5)

If J(u ), this also means h(a), continuously different!able, we can define the cleft-overstep step as follows:

"•("t™. >0, h(a')<h{0) (6)

(or"is the overstep step, means that it oversteps the cleft)

The variation graph of function h(a), when the optimization trajectory changes fi'om the

starting point w* to the ending point w*' is illustrated in figure 2. We can see that when the value a ascends from 0, go through the minimal point a' of h(a), to the value a", the corresponding optimization trajectoiy moves forward parallely with s'' in the relationship that »*'=«'+0^^/,A =0,1.. and takes a step length of a = a''>a'. This graph also shows that, considering the moving direction, the target function changes in the descending direction from point u , but when it reaches point »/ ' it changes to the ascending direction

If we use moving steps according to condition (5), we may be trapped at the cleft axis and the corresponding optimization algorithm is also trapped at that point. Also, if the optimization process follows condition (6), the searching point is not allowed to be located at the cleft bottom before the optimal solution is obtained and, simultaneously, it always draw a trajectory which overstep the cleft bottom. In order to obtain effective and stable iteration process, condition (6) is substituted by condition (7).

a' >a' =argmin/i(a), h{a"\-h' <X\h° -h\

(7)

Where: 0 < X < 1 is called overstep coefficient

h' =h(a') h" =h{fz'') Determining ihe cleft-overstep step

Figure I: Cleft-similar error surface 54

(3)

Nguyen Httu Cong vd Dtg Tgp chi KHOA HQC & CONG NGH$ 93(05): 53 - 59

Figure 2: Determining the cleft-overstep step

Choosing the length of leaming step in the cleft problem is of grave importance. If this length is too short, the running time of computers will be long. If this length is too long, there may be difficulties in searching process because it is difficult to observe the curvedness of the cleft. Therefore, adaptive leaming step for the cleft problem is essential in the process of searching for optimal solufion. In this secfion, we propose a simple, yet effective way to find the cleft-overstep step. Suppose that J(u) is continuous and satisfies the condition limJ(u) oo when

| H | - > X and at each iteration k, point u''"' and moving vector / ' was determined We need to determine the length of the step or^ which satisfies condition (7).

If instead of h* in (7) by estimate ht^h ,h>h ^g 5^111 get cleft-overstep by

definition. Therefore, to simplify programming should take the

smallest value of h simply calculated corresponding in each iteration,

without accurately determining h *. It also significantly reduces the number of objective function's value. We have * identified the following algorithm:

PROGRAM AND RESULTS

To illustrate above remarks, here we offer a method using neural network training with the Backpropagation procedure and

leaming step calculated by principles of cleft- overstep principle. The example: for an input

vector, neural networks have to answer what it is. The software provides

a structured network of 35 input neurons, 5 middle layer neurons, 10 output layer neurons. Sigmoid is activated function; this function's characteristic is easy to make the cleft - error surface. [ 1

( h(a)=h(u+ )

a=p

i

P=a B=1.53;

hCa)<h(

^

s 1 ti(a)>hr-

1

a=0 P=a

•( )

Figure 3: Diagram of algorithm that determines the cleft-overstep learning step

(4)

Nguyin Httu Cdng vo Dtg T^p chi KHOA HOC & C 6 N G N G H $ The principle of network training algoritms is

done by Backpropagation procedure associated with learning step calculated by cleft-overstep principle. Cleft-overstep algoritm has been presented in section 2.

93(05): 53 - 59

Lop lid Lqiin Lqin Figure 4: Structure of neural network for

recognuion

Thus with the use of the fastest descent methods to update the weight of the network, we need the Information related to separate derivative of the error function in each weight. that is to determine formulate and update algorithm of weight in the hidden layer and output layer For a given sample set, wc will calculate the derivative of error function by taking the derivative's sum on each sample in that set. Analysis method and the derivative are based on the "chain rule". According to the slope of the tangent to the error curve in the w - axis cross section called the partial derivative oferror function J taken by that weight, denoted 9 J/5 w, using the chain rule we have:

dw dw dw dw Adjust the weights of output layer:

Define:b: the weight of output layer z: the output of output layer.

t: the desired target value yj: the output of neurons in the hidden layer

56

v: total weight of -^ so 5 V / 5 bj = yj (ignoring the index of neurons in output layer)

We use J = 0.5 * (z-t)2, so a J / 5 z = (z-t).

Activated function of output layer neuron is sigmoid z = g(v), with dzldv = z (l-z).

We have:

dJ dJ dz dv , . , .

— =—.—-—={z~tl.z[\-z).y

dz dv db

₍₉₎

Since then we have the updated formula of output layer's weight as follows (ignoring the

. . . . ^b = a.{z~t).z.{\-z).y ....

indices): ^ ' ^ ' (10) We will use formula (10) [4]

in the procedure DIEUCHINHTRONGSO 0 to adjust the weight of output layer, learning rate a is calculated according to the principle of the cleft-overstep principle.

Adjust the weights of hidden layers:

The derivative of the objective function for a weight of hidden layer is calculated by chain rule:

rJida = [OJjiy).{Pyldit).{dulca)

Define: a: the weight of hidden layer y: output of the neuron in the hidden layer

\i- the components of the input vector of input layer

u: total weight yv-i

'=0 so 5 u / 9 ai = xi k: index of neuron in output layer We have objective derivative of the weight of hidden layer

a/ '^"^

:=ll^k'hW(^-H)h-y{^'yh

k=0 (11)

From here, we can adjust formula for weight of the hidden layer as below:

K-]

'•O (12)

ca,

(5)

Nguyin Hihi Cong va Dig Tap chi KHOA HQC & CONG NGHS 93(05): 53 - 59 In this formula, the index i denotes the

ith neuron of input layer, the index k denotes the kth neuron of output layer.

We will use formula (12) [4] in the procedure DIEUCHINHTRONGSO () to adjust the weights of hidden layer, leaming rate a is calculated according to the principle of the cleft-overstep principle.

The network structure

Using the sigmoid function that is prone to produce the narrow cleft

network quality, the l + exp{-x)

equation '^^ ' (13)

Example

Recognizing the characters are the digits 0,1, ...9;[1]

Comparison of convergence of the three leaming step methods: cleft-

Table 1: Input sample flit

overstep principle, fixed step and gradually descent step.

We use a matrix o f 5 x 7 =35 for each character encoding. Corresponding to each input vectorxisa vector of size 35 x l,with components receiving the value of 0 or 1.

Thus, we can select the input layer with 35 inputs. To distinguish ten characters, we make the output layer is 10. For the hidden layer, five neurons are selected, so Hidden layer weight matrix: Wl,l, size 3 5 x 5

Output layer weight matrix: W2,l, size 5 x 1 0 Input vector x, size 35 x I

Hidden layer output vector y, size 5 x I Output layer output vector z, size 1 0 x 1 After compiling with Visual C + +, run the program, in turn training the network by 3 methods: fixed learning step, gradually descent step and cleft overstep. Each method, we try to train 20 times. The result gives the following table:

records {012345678 9}

Fixed Step 0.2 Gradually from 1

descent step Cleft-overstep 1

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Summary

Fail 7902 (iteration) 7213 Fail 12570 9709 Fail 9173 Fail 8410 10333 12467 Fail 9631 12930 10607 Fail 7965 11139 Fail Average:

Fail/20

10003 iterations.

Fail 3634 (iteration) 2416 2908 2748 3169 2315 2375 Fail 2820 2618 2327 3238 2653 2652 Fail 2792 2322 2913 2689 7 Average:

Fail/20

2740 iteration, Fail 23 (iteration) 50 34 31 42 43 33 34 33 32 39 44 Fail 31 53 31 42 42 33

3 Average: 35 iteration, fail/20

2

(6)

Nguyen HOu Cong vd Dtg Tap chl KHOA HQC & C 6 N G N G H $ 93(05): 53-59 Comments:

We have trained the network by three different methods and have realized that learning by cleft overstep principle has much higher convergence speed, the number of fail times is also reduced.

One drawback of the cleft overstep principle is that the limc for calculating by computer in each iteration is long, this is because wc defined that the constants FD = 1 e4 is small. However, the total network training time is more beneficial.

CONCLUSION

In this paper, the authors have proposed successfully the use of "cleft-overstep"

algorithm to improve the neural network training having the special error surface and have illustrated particularly through the application of hand writing recognition.

Through research and experimentation, gained results have shown that: with the neural network structure that error surface is shaped deep cleft, still use this Backpropagation algorithm but applying

"cleft-overstep" to train the network gives us more accuracy and faster convergence speed than the gradient method.

The use of "cleft-overstep" algorithm can be applied to train a neural network structure that has the special error surface. Thus, the results of this study can be applied to many other problems in the field of telecommunications, control, and information technology.

The paper should be more research on the identification of vector direction searching in the algorithm "cIeft-overstep"and the

changing the assess standard of the quality function to reduce the complexity of the calculation process on the computer [6]. However, the results of this study have initially reflected the correctness of the proposed algorithm and revealed possibilities to practical applications.

REFERENCES

[1], Cong Huu Nguyen; Thanh Nga Thi Nguyen; Phuong Huy Nguyen(2011); Research on the application of genetic algorithm combined with the "cleft-overstep" algorithm for improving learning process of MLP neural network with special error surface , Natural Computation (ICNC). 2011 Seventh International Conference on Issue Date: 26-28 July 2011; On page(s): 222-221.

[2]. Maciej Lawrynczuk (2010), "Training or neural models for predictive control", Insitute of control and computation Engineering, Faculty of Electronics and Information Technology, Warsaw University of Technology, ul. Nowowiejska 15/19, 00-665 Warsaw, Poland, Neurocomputing 73.

[3]. Thuc Nguyen Dinh & Hai Hoang Due, Artificial Intelligence - Neural Network, Method and Application. Educational Publisher, Ha noi.

[4]. Nguyen Van Manh and Bui Minh Tri, Method of "cleft-overstep" by perpendicular direction for solving the unconstrained nonlinear optimization problem, Acta Mathematica Viemamica, vol. 15, N"2, 1990.

[5]. Hagan. M.T, H.B. DemuUi and M.H Beal, Neural Networks Design, PWS Publishing Company, Boston, 1996.

[6]. R.K. Al Seyab, V. Cao (2007) "Nonlinear system identification for predictive control using continuous time recurrent neural networks and automatic differentiation". School of Engineering Cranfield University, College Road. Cranfield, Bedford MK43 OAL, UK, Science Direct

(7)

Nguyin Hiiu Cdng vd Dtg T^p chf KHOA HQC & CONG NGH$ 93(05): 53 - 59

TOM TAT

NGHIEN C t r u CAI T I £ ; N T H U A T T O A N H Q C CUA MANG N O R O N Nguyin Hfru C6ng'', Nguyin Thj Thanh NgaS D6ng VSn Ng9c^

'£><ij hQC Thdi Nguyen, 'Truang Dgi hoc Kp ihiidl Cong nghiep - DH Thdi Nguyen Trudng Cao ddng nghe Cadi^n Ha NQI Tir giita thi ky XX, nghiSn ci5:u v^ c^c thuylt toin t6i uu hda, d^c bi§t IS sy phdt tri^n ciia ky thu^t s6 mdy tinh, dang ngiy c^ng trd thinh m^t linh v\rc quan trpng trong todn h^c. Ng^y nay, nhOng cdng cy todn hpc niy dugrc dp dgng de hu5n luy^n cdc mgng noron. Trong qud trinh tim ki€m mpt thuat todn toi uu de giam thilu th6i gian hOi tu hojic trdnh cdc cgc tieu yeu, cgc tieu dia phuong, deu bat dau tir viec nghiSn cun cdc d^c tfnh ciia m$t loi (m$t sai s6). Doi vdi cdc bl mdt ldi phiic tap nhu mat loi c6 bl mdt hd, dudng n^t ciia nd dugc k^o ddi, u6n cong hinh thdnh trgc vd hd hdm ech, cdc thuat todn cu khdng thi gidi quylt dupc. Bdi bdo ndy dl xuSt m^t thu^t todn dl cdi thi?n sg h^i tu vd khd ndng de thodt ra tir cdc khu vyc khdng mong mu6n tr6n cdc be mJt loi ddc biet.

Tii" khda: mgng naron, mgt ldi dgc bi?t, cgc lieu dja phuang, loi uv hoa. thugt loan hgc

Ngdy nhgn bdi:7/5/2012. ngdy phdn bien: 26/5/2012, ngdy duy?t ddng: 12/6/2012