Quasi-Newton Methods - BOOK OPTIMIZATION CONCEPTS AND APPLICATIONS IN ENGINEERING

3.8 Quasi-Newton Methods 117 Choosingu=δkandv=Hkγ_kwe getau^Tγ_k=1 andbv^Tγ_k= −1 which determine aandb. The DFP update is now given by

H^k+1_DFP =H− Hγ γ^TH γ^THγ + δδ^T

δ^Tγ (3.47)

The superscriptkhas been omitted in the matrices on the right-hand side of the pre- ceding equation. We note thatHremains symmetric. Further, it can be shown that Hremains positive definite (assumingH0is selected to be positive definite). Thus, the direction vectord= −Hk∇f(xk) is a descent direction at every step. Further, when applied to quadratic functionsq(x)=1/2x^TA x+c^Tx, thenHⁿ=A⁻¹. This means that, for quadratic functions, the update formula results in an exact inverse to the Hessian matrix afterniterations, which implies convergence at the end ofniter- ations. We saw that the conjugate gradient method also possessed this property. For large problems, the storage and update ofHmay be a disadvantage of quasi-Newton methods as compared to the conjugate gradient method.

This program uses the quadratic fit algorithm for line search.

Another quasi-Newton update formula was suggested by Broyden, Fletcher, Goldfarb, and Shanno, known as the BFGS formula [F3]

H^k_BFGS⁺¹ =H−

δγ^TH +Hγ δ^T δ^Tγ

1+γ^THγ δ^Tγ

δδ^T

δ^Tγ (3.48) The BFGS update is better suited than the DFP update when using approximate line search procedures. Section 3.9 contains a discussion of approximate line search.

Example 3.11

Consider the quadratic function f =x²₁+10x₂². We will now perform two iter- ations using DFP update, starting at (1,1). The first iteration is the steepest descent method sinceH0=identity matrix. Thus,d0=(−2,−20)^T,α0=0.05045, andx1=(0.899,−0.0089)^T. In program DFP, the direction vector is normalized to be an unit vector; this is not done here. The gradient at the new point is

∇f(x₁)=(1.798,−0.180). Thus,γ0= ∇f(x₁)− ∇f(x₀)=(−0.2018,−20.180)^T, δ0 =x1 −x0 =(−0.101, −1.1099)^T, and Eq.(3.47)gives H¹=[₋¹₀^._.⁰₀₀₅ ⁻⁰₀^._.⁰⁰⁵₀₅ ], followed byd1 =(−1.80, 0.018)^T,α1 =0.5, x² =(0.0, 0.0). The tolerance on the norm of the gradient vector causes the program to terminate with (0, 0) as the final solution. However, if we were to complete this step, then we find H²=[⁰₀^.^{5 0}₀_.₀₅], which is the inverse of the Hessian of the quadratic objective function.

Example 3.12

Programs STEEPEST (steepest decent), FLREEV (conjugate gradient method of Fletcher–Reeves) and DFP (quasi–Newton method based on the DFP update), which are provided in the text, are used in this example. Three functions are considered for minimization:

(i) Rosenbrock’s function (the minimum is in a parabolic valley):

f =100(x₂−x₁²)²+(1−x₁)² x0=(−1.2,1)^T

(ii) Wood’s function (this problem has several local minima):

f =100(x2−x₁²)²+(1−x1)²+90(x4−x₃²)²+(1−x3)² +10.1[(x₂−1)²+(x₄−1)²]+19.8(x₂−1)(x₄−1) x0=(−3,−1,−3,−1)^T

(iii) Powell’s function(the Hessian matrix is singular at the optimum):

f =(x₁+10x₂)²+5(x₃−x₄)²+(x₂−2x₃)⁴+10(x₁−x₄)⁴ x₀=(−3,−1,0,1)^T

The reader is encouraged to run the computer programs and verify the results in Table 3.1. Note the following changes need to be made when switching from one problem to another in the code:

(i) N (start of the code,=number of variables) (ii) Starting point (start of the code)

(iii) F=. . . (function definition in Subroutine GETFUN)

(iv) DF(1)= , . . . , DF(N)= (gradient definition in Subroutine GRADIENT)

The reader may experiment with various parameters at the start of the code too. However, Table 3.1 uses the same parameters for the different methods and problems. NF =# function calls, NG =# gradient calls, in Table 3.1.

3.8 Quasi-Newton Methods 119 Table 3.1. Results Associated with Example 3.12.

STEEPEST (line search FLREEV (line search DFP (line search based PROBLEM based on quadratic fit) based on quadratic fit) on quadratic fit) Rosenbrock’s NF=8815

NG=4359 x^∗=(0.996, 0.991)^T f^∗=1.91e-5

NF=141 NG=28

x^∗=(1.000, 1.000)^T f^∗=4.53e-11

NF=88 NG=18

x^∗=(1.000, 1.000)^T f^∗=3.70e-20 Wood’s NF=5973

NG=2982 x^∗=(1.002,1.003,

0.998, 0.997)^T f^∗=9.34×10⁻⁶

NF=128 NG=26 x^∗=(1.000,1.000,

1.000, 1.000)^T f^∗=8.14×10⁻¹¹

NF=147 NG=40 x^∗=(1.000,1.000,

1.000, 1.000)^T f^∗=6.79×10⁻¹¹ Powell’s NF=3103

NG=893 x^∗=(−.051, 0.005,

−0.02515,−.02525)^T f^∗=1.35×10⁻⁵

NF=158 NG=30

x^∗=(−.030, 0.003,

−0.011104,−0.01106)^T f^∗=1.66×10⁻⁶

NF=52 NG=21 x^∗=(−.00108,

−0.0000105, 0.0034, 0.0034)^T f^∗=1.28×10⁻⁹

Example 3.13 (“Mesh-Based Optimization”)

An important class of optimization problems is introduced through this example. We use the term “mesh-based optimization” problem to refer to problems that require the determination of a function, sayy(x), on a mesh. Thus, the number of variables depends on how finely the mesh is discretized. TheBrachis- tochroneproblem involves finding the path or shape of the curve such that an object sliding from rest and accelerated by gravity will slip from one point to another in the least time. While this problem can be solved analytically, with the minimum time path being acycloid, we will adopt a discretized approach to illustrate mesh-based optimization, which is useful in other problems.

Referring to Fig. E3.13a, there are n discretized points in the mesh, and heightsyiat points 2, 3, . . . ,n−1 are the variables. Thus,

y=[y1,y2, . . . ,yn]^T x=[y₂,y₃, . . . ,y_n₋₁]^T

Importantly, a coarse mesh with, say,n=5 points is first used and correspond- ing optimum heights are determined. Linear interpolation on the coarse mesh solution to obtain the starting point for a fine mesh is then used. The Matlab functioninterp1 makes this task simple indeed. This way, optimum solutions with any sizenis obtainable. Directly attempting a large value of ngenerally fails – that is, the optimizer cannot handle it.

Consider a line segment in the path as shown inFig. E3.13a. From mechanics, we have

−μmg(cosθ)(L)=¹₂m(v²_i+1−v_i²)+mg(yi+1−yi) (a) wheremis the mass,vrefers to velocity andg>0 is acceleration due to gravity. For frictionless sliding,μ=0. Further, the timeti taken to traverse this segment is given by

ti =(vi+1−vi)L g(y_i−y_i₊₁) whereL=

h²+(yi+1−yi)² (b) Equations(a)and(b)provide the needed expressions. The objective function to be minimized isT=_n₋₁

i=1ti. A starting shape (say, linear) is first assumed.

Since the object starts from rest, we havev1=0. This allows computation ofv2

from Eq. (a), andt1 from Eq.(b). This is repeated fori=2, . . . ,n−1, which then defines the objective functionf=total timeT.

y₁ m

y₂

1 2

Piecewise linear

i i + 1 n − 1 n

y_i

h y_{i + 1}

y_{i + 1} y_i

ΔL vi

vi + 1

~~

Figure E3.13a

Dalam dokumen BOOK OPTIMIZATION CONCEPTS AND APPLICATIONS IN ENGINEERING (Halaman 132-137)