• Tidak ada hasil yang ditemukan

The Conjugate Gradient Method

Typically, the steepest descent method takes only a few iterations to bring a far- off starting point into the “optimum region” but then takes hundreds of iterations to make very little progress toward the solution.

Scaling

The aforementioned discussion leads to the concept of scaling the design variables so that the Hessian matrix is well conditioned. For the function f =x12+a x22, we can define new variablesy1andy2as

y1 =x1,y2=√ ax2

which leads to the function in y variables given by g=y12+y22. The Hessian of the new function has the best conditioning possible with contours corresponding to circles.

In general, consider a functionf=f(x). If we scale the variables as

x=T y (3.21)

then the new function is g(y)≡f(T y), and its gradient and Hessian are given by

g=TTf (3.22)

2g=TT2fT (3.23)

Usually T is chosen as a diagonal matrix. The idea is to choose T such that the Hessian ofghas a condition number close to unity.

Example 3.8

Considerf=(x1−2)4+(x1−2x2)2,x0=(0, 3)T. Perform one iteration of the steepest descent method.

We havef(x0)=52,∇f(x)=[4(x1−2)3+2(x1−2x2),−4(x1−2x2)]T. Thus, d0= − ∇f(x0)=[44,−24]T. Normalizing the direction vector to make it a unit vector, we haved0=[0.8779,−0.4789]T. The solution to the line search problem minimizef(α)=f(x0+αd0) withα >0 yieldsα0=3.0841. Thus, the new point isx1=x0+α0d0=[2.707, 1.523]T, withf(x1)=0.365. The second iteration will now proceed by evaluating∇(xk+1), etc.

3.6 The Conjugate Gradient Method 107 general functions. Conjugate gradient methods were first presented in [Fletcher and Powell, 1963].

Consider the problem of minimizing a quadratic function

minimize q(x)=12 xTA x + cTx (3.24) where we assumeAis symmetric and positive definite. We define conjugate direc- tions, or directions that are mutually conjugate with respect toA, as vectors that satisfy

diTAdj =0, i= j, 0≤i, jn (3.25) The method of conjugate directions is as follows. We start with an initial point x0 and a set of conjugate directionsd0,d1, . . . ,dn1. We minimizeq(x) alongd0to obtainx1. Then, fromx1, we minimizeq(x) alongd2to obtainx2. Lastly, we minimize q(x) along dn1 to obtain xn. The point xn is the minimum solution. That is, the minimum to the quadratic function has been found innsearches. In the algorithm that follows, thegradientsofqare used to generate the conjugate directions.

We denoteg to be the gradient ofq, withgk= ∇q(xk)=A xk+c. Letxkbe the current point withk=an iteration index. The first directiond0is chosen as the steepest descent direction,−g0. We proceed to find a new pointxk+1by minimizing q(x) alongdk. Thus

xk+1=xk+αkdk (3.26)

whereαk is obtained from the line search problem: minimizef(α)=q(xk +αdk).

Setting dq(α)/dα=0 yields

αk= − dTkgk

dTkA dk (3.27)

Also, the exact line search condition dq(α)/dα=0 yields

dTkgk+1=0 (3.28)

Now the key step: we choosedk+1to be of the form

dk+1= −gk+1+βkdk (3.29)

The aforementioned represents a “deflection” in the steepest descent direction

gk+1. This is illustrated in Fig. 3.6. Requiring dk+1 to be conjugate to dk or dTk+1A dk=0, gives

gTk+1A dk+βkdTkA dk=0

x2 d1

d0

x0 x1

f(xΔ 1)

Figure 3.6. Conjugate directions for a quadratic in two variables.

From (3.20),dk=(xk+1xk)k.Thus,A dk=(gk+1gk)k. The preceding equa- tion now gives

βk= gTk+1(gk+1gk)

αkdTkA dk (3.30)

Using (3.28) and (3.29) withkreplaced byk– 1, we get dTkgk= −gTkgk

Thus, (3.27) gives

αk= gTkgk

dTkA dk (3.31)

Substituting forαkfrom the preceding equation into (3.30) yields βk= gTk+1(gk+1gk)

gTkgk (3.32)

ALGORITHM:We may now implement the conjugate gradient method as follows.

Starting withk=0, an initial pointx0andd0= −∇q(x0), we perform line search – that is, determine αk from (3.31) and then obtain xk+1 from (3.26). Then, βk is obtained from (3.32) and the next directiondk+1is given from (3.29).

Since the calculation ofβk in (3.32) is independent of theAandcmatrices of the quadratic function, it was natural to apply the preceding method to nonquadratic functions. However, numerical line search needs to be performed to findαkinstead of the closed form formula in (3.31). Of course, the finite convergence innsteps is valid only for quadratic functions. Further, in the case of general functions, arestart

3.6 The Conjugate Gradient Method 109 is made everyniterations wherein a steepest descent step is taken. The use of (3.32) is referred to as thePolak–Rebierealgorithm. If we consider

gTk+1gk =gTk+1(−dk+βk−1dk−1)=βk−1 gTk+1dk−1

=βk1(gTk+αkdTkA)dk1

=0 we obtain

βk=gTk+1gk+1 gTkgk

(3.33) which is theFletcher–Reevesversion [Fletcher and Reeves 1964].

Example 3.9

Consider f =x12+4x22,x0=(1, 1)T. We will perform two iterations of the con- jugate gradient algorithm. The first step is the steepest descent iteration. Thus,

d0= −f(x0)= −(2,8)T

In the example here, the direction vectors are not normalized to be unit vectors, although this is done in the programs:

f(α)= f(x0+αd0)=(1−2α)2+4(1−8α)2

which yieldsα0=0.1307692,x1=x0 +α0d0=(0.7384615,−0.0461538)T. For the next iteration, we compute

β0= &&f(x1)&&2

&&f(x0)&&2 =2.3176/68=0.0340828 d1= −fT(x1)+β0d0=

−1.476923 0.369231

+0.0340828 −2

−8

=

−1.54508 0.09656

f(α)= f(x1+αd1)=(0.7384615−1.54508α)2+4(−0.0461538+0.09656α)2 which yields

α1=0.477941 x2=x1+α1d1=

0.7384615

−0.0461538

+ 0.477941

−1.54508 0.09656

=

0 0

As expected from theory, convergence is reached aftern=2 searches.

Solution of Simultaneous Equations in Finite Element Analysis

In finite element analysis, the equilibrium condition can be obtained by minimizing the potential energy

=12QTK QQTF (3.34)

with respect to the displacement vectorQ=[Q1,Q2, . . . ,QN]T, whereN=number of nodes or degrees of freedom in the model. For a one-dimensional system, we have n=N, while for a three-dimensional system,n=3N.Kis a positive definite stiff- ness matrix andFis a load vector. The popular approach is to apply the necessary conditions and solve the system of simultaneous equations

KQ=F (3.35)

While the basic idea is to use Gaussian elimination, the special structure ofKand the manner in which it is formed is exploited while solving (3.35). Thus, we have banded, skyline and frontal solvers. However, the conjugate gradient method applied to the function is also attractive and is used in some codes, especially when Kis dense – that is, whenKdoes not contain large number of zero elements, causing sparse Gaussian elimination solvers to become unattractive. The main attraction is that the method only requires two vectors∇(Qk) and∇((Qk+1) – whereβkin (3.33) is computed. Furthermore, computation of∇=KQFdoes not require the entireKmatrix as it is assembled from “element matrices.” A brief example is given in what follows to illustrate the main idea.

Consider a one-dimensional problem in elasticity as shown in Fig. 3.7a. The bar is discretized using finite elements as shown inFig. 3.7b. The model consists of NE elements andN=NE +1 nodes. The displacement at node Iis denoted byQI. Each finite element is two-noded as shown inFig. 3.7c, and the “element displacement vector” of thejth element is denoted byq(j) =[Qj,Qj+1]T. The key point is that the (N×N) global stiffness matrixKconsists of (2×2) element stiffness matriceskas shown inFig. 3.7d. Thus, evaluation of∇=KQFdoes not require forming the entireKmatrix – instead, for each elementj,j=1, 2, . . . ,NE, we can compute the (2×1) vectork(j).q(j)and place it in thejth andj+1th locations of

. This is done for each element, with overlapping entries being added. Thus, weassemblethe gradient vector∇without forming the global stiffness matrixK.

Computation of step sizeαkrequires computation ofdTK d, wheredis a conjugate direction. This computation can be done at the element level asNE

j=1d(j)T k(j)d(j), whered(j)=[dj,dj+1]T.

Preconditioning theKmatrix so that it has a good condition number is impor- tant in this context since for very large sizeK, it is necessary to obtain a near-optimal solution rapidly.

3.6 The Conjugate Gradient Method 111

x

(a)

(b)

(c)

(d)

1 2

1 2 3 nodes

elements

I

I N

I + 1 NE

k k11 k21

k12 k22 I =

I I + 1

k11 k12

k =

0

0 ...

1

k211

k221

k212 k222 k112

k122 1

NE

1 2

+

k11

k22 + 3

Figure 3.7. One-dimensional finite elements.