The Conjugate Gradient Method - BOOK OPTIMIZATION CONCEPTS AND APPLICATIONS IN ENGINEERING

Typically, the steepest descent method takes only a few iterations to bring a far- off starting point into the “optimum region” but then takes hundreds of iterations to make very little progress toward the solution.

Scaling

The aforementioned discussion leads to the concept of scaling the design variables so that the Hessian matrix is well conditioned. For the function f =x₁²+a x₂², we can define new variablesy₁andy₂as

y1 =x1,y2=√ ax2

which leads to the function in y variables given by g=y₁²+y₂². The Hessian of the new function has the best conditioning possible with contours corresponding to circles.

In general, consider a functionf=f(x). If we scale the variables as

x=T y (3.21)

then the new function is g(y)≡f(T y), and its gradient and Hessian are given by

∇g=T^T∇f (3.22)

∇²g=T^T∇²fT (3.23)

Usually T is chosen as a diagonal matrix. The idea is to choose T such that the Hessian ofghas a condition number close to unity.

Example 3.8

Considerf=(x₁−2)⁴+(x₁−2x₂)²,x₀=(0, 3)^T. Perform one iteration of the steepest descent method.

We havef(x₀)=52,∇f(x)=[4(x₁−2)³+2(x₁−2x₂),−4(x₁−2x₂)]^T. Thus, d0= − ∇f(x0)=[44,−24]^T. Normalizing the direction vector to make it a unit vector, we haved0=[0.8779,−0.4789]^T. The solution to the line search problem minimizef(α)=f(x0+αd0) withα >0 yieldsα0=3.0841. Thus, the new point isx₁=x₀+α0d₀=[2.707, 1.523]^T, withf(x₁)=0.365. The second iteration will now proceed by evaluating∇(x_k+1), etc.

3.6 The Conjugate Gradient Method 107 general functions. Conjugate gradient methods were first presented in [Fletcher and Powell, 1963].

Consider the problem of minimizing a quadratic function

minimize q(x)=¹₂ x^TA x + c^Tx (3.24) where we assumeAis symmetric and positive definite. We define conjugate directions, or directions that are mutually conjugate with respect toA, as vectors that satisfy

d^iTAd^j =0, i= j, 0≤i, j≤n (3.25) The method of conjugate directions is as follows. We start with an initial point x0 and a set of conjugate directionsd⁰,d¹, . . . ,dⁿ⁻¹. We minimizeq(x) alongd⁰to obtainx¹. Then, fromx¹, we minimizeq(x) alongd²to obtainx². Lastly, we minimize q(x) along dⁿ⁻¹ to obtain xⁿ. The point xⁿ is the minimum solution. That is, the minimum to the quadratic function has been found innsearches. In the algorithm that follows, thegradientsofqare used to generate the conjugate directions.

We denoteg to be the gradient ofq, withg_k= ∇q(x_k)=A x_k+c. Letx_kbe the current point withk=an iteration index. The first directiond0is chosen as the steepest descent direction,−g0. We proceed to find a new pointxk+1by minimizing q(x) alongdk. Thus

x_k+1=x_k+αkd_k (3.26)

whereαk is obtained from the line search problem: minimizef(α)=q(x_k +αdk).

Setting dq(α)/dα=0 yields

αk= − d^T_kg_k

d^T_kA d_k (3.27)

Also, the exact line search condition dq(α)/dα=0 yields

d^T_kgk+1=0 (3.28)

Now the key step: we choosedk+1to be of the form

d_k+1= −gk+1+βkdk (3.29)

The aforementioned represents a “deflection” in the steepest descent direction

−gk+1. This is illustrated in Fig. 3.6. Requiring dk+1 to be conjugate to dk or d^T_k₊₁A d_k=0, gives

g^T_k+1A dk+βkd^T_kA dk=0

x² d¹

d⁰

x⁰ x¹

− f(xΔ ¹)

Figure 3.6. Conjugate directions for a quadratic in two variables.

From (3.20),dk=(xk+1−xk)/αk.Thus,A dk=(gk+1−gk)/αk. The preceding equation now gives

βk= g^T_k+1(g_k+1 −g_k)

αkd^T_kA d_k (3.30)

Using (3.28) and (3.29) withkreplaced byk– 1, we get d^T_kgk= −g^T_kgk

Thus, (3.27) gives

αk= g^T_kg_k

d^T_kA d_k (3.31)

Substituting forαkfrom the preceding equation into (3.30) yields βk= g^T_k₊₁(g_k₊₁ −g_k)

g^T_kg_k (3.32)

ALGORITHM:We may now implement the conjugate gradient method as follows.

Starting withk=0, an initial pointx0andd0= −∇q(x₀), we perform line search – that is, determine αk from (3.31) and then obtain xk+1 from (3.26). Then, βk is obtained from (3.32) and the next directiondk+1is given from (3.29).

Since the calculation ofβk in (3.32) is independent of theAandcmatrices of the quadratic function, it was natural to apply the preceding method to nonquadratic functions. However, numerical line search needs to be performed to findαkinstead of the closed form formula in (3.31). Of course, the finite convergence innsteps is valid only for quadratic functions. Further, in the case of general functions, arestart

3.6 The Conjugate Gradient Method 109 is made everyniterations wherein a steepest descent step is taken. The use of (3.32) is referred to as thePolak–Rebierealgorithm. If we consider

g^T_k₊₁g_k =g^T_k₊₁(−d_k+βk−1d_k−1)=βk−1 g^T_k₊₁d_k−1

=βk−1(g^T_k+αkd^T_kA)dk−1

=0 we obtain

βk=g^T_k₊₁g_k₊₁ g^T_kgk

(3.33) which is theFletcher–Reevesversion [Fletcher and Reeves 1964].

Example 3.9

Consider f =x₁²+4x₂²,x0=(1, 1)^T. We will perform two iterations of the conjugate gradient algorithm. The first step is the steepest descent iteration. Thus,

d0= −∇f(x₀)= −(2,8)^T

In the example here, the direction vectors are not normalized to be unit vectors, although this is done in the programs:

f(α)= f(x0+αd0)=(1−2α)²+4(1−8α)²

which yieldsα0=0.1307692,x1=x0 +α0d0=(0.7384615,−0.0461538)^T. For the next iteration, we compute

β0= &&∇f(x¹)&&²

&&∇f(x⁰)&&² =2.3176/68=0.0340828 d1= −∇f^T(x₁)+β0d0=

−1.476923 0.369231

+0.0340828 −2

−8

−1.54508 0.09656

f(α)= f(x₁+αd1)=(0.7384615−1.54508α)2+4(−0.0461538+0.09656α)² which yields

α1=0.477941 x2=x1+α1d1=

0.7384615

−0.0461538

+ 0.477941

−1.54508 0.09656

0 0

As expected from theory, convergence is reached aftern=2 searches.

Solution of Simultaneous Equations in Finite Element Analysis

In finite element analysis, the equilibrium condition can be obtained by minimizing the potential energy

=¹₂Q^TK Q−Q^TF (3.34)

with respect to the displacement vectorQ=[Q1,Q2, . . . ,QN]^T, whereN=number of nodes or degrees of freedom in the model. For a one-dimensional system, we have n=N, while for a three-dimensional system,n=3N.Kis a positive definite stiffness matrix andFis a load vector. The popular approach is to apply the necessary conditions and solve the system of simultaneous equations

KQ=F (3.35)

While the basic idea is to use Gaussian elimination, the special structure ofKand the manner in which it is formed is exploited while solving (3.35). Thus, we have banded, skyline and frontal solvers. However, the conjugate gradient method applied to the function is also attractive and is used in some codes, especially when Kis dense – that is, whenKdoes not contain large number of zero elements, causing sparse Gaussian elimination solvers to become unattractive. The main attraction is that the method only requires two vectors∇(Qk) and∇((Qk+1) – whereβkin (3.33) is computed. Furthermore, computation of∇=KQ−Fdoes not require the entireKmatrix as it is assembled from “element matrices.” A brief example is given in what follows to illustrate the main idea.

Consider a one-dimensional problem in elasticity as shown in Fig. 3.7a. The bar is discretized using finite elements as shown inFig. 3.7b. The model consists of NE elements andN=NE +1 nodes. The displacement at node Iis denoted byQ_I. Each finite element is two-noded as shown inFig. 3.7c, and the “element displacement vector” of thejth element is denoted byq^(j) =[Q_j,Q_j₊₁]^T. The key point is that the (N×N) global stiffness matrixKconsists of (2×2) element stiffness matriceskas shown inFig. 3.7d. Thus, evaluation of∇=KQ−Fdoes not require forming the entireKmatrix – instead, for each elementj,j=1, 2, . . . ,NE, we can compute the (2×1) vectork^(j).q^(j)and place it in thejth andj+1th locations of

∇. This is done for each element, with overlapping entries being added. Thus, weassemblethe gradient vector∇without forming the global stiffness matrixK.

Computation of step sizeαkrequires computation ofd^TK d, wheredis a conjugate direction. This computation can be done at the element level as_NE

j=1d⁽^j)^T k⁽^j)d^(j), whered^(j)=[dj,dj+1]^T.

Preconditioning theKmatrix so that it has a good condition number is impor- tant in this context since for very large sizeK, it is necessary to obtain a near-optimal solution rapidly.

3.6 The Conjugate Gradient Method 111

(a)

(b)

(c)

(d)

1 2

1 2 3 nodes

elements

I N

I + 1 NE

k k₁₁ k₂₁

k₁₂ k₂₂ I =

I I + 1

k₁₁ k₁₂

k =

0 0 ...

k₂₁1

k₂₂1

k₂₁2 k₂₂2 k₁₁2

k₁₂2 1

1 2

k₁₁

k₂₂ + 3

Figure 3.7. One-dimensional finite elements.

Dalam dokumen BOOK OPTIMIZATION CONCEPTS AND APPLICATIONS IN ENGINEERING (Halaman 122-128)