Second-Order Optimality Conditions - Foundations of Optimization

and the above formula forHn gives

x²_j=n(n−1)/2.

Thus, the minimum value off is 1

4n(n−1)(1 + ln 2)−1 2

jlnj.

2.3 Second-Order Optimality Conditions

Definition 2.11.Ann×nmatrixAis called positive semidefiniteif hAd, di ≥0 for all d∈Rⁿ.

It is called positive definiteif

hAd, di>0 for all d∈Rⁿ, d6= 0.

Note that if A is positive semidefinite, then aii = hAei, eii ≥ 0, and if A is positive definite, then aii > 0. Similarly, choosing d = tei+ej gives q(t) :=aiit²+ 2aijt+ajj ≥0 for allt∈R. Recall that the quadratic function q(t) is nonnegative (positive) if and only if its discriminant∆= 4(a²_ij−a_iia_jj) is nonpositive (negative). Thus,aiiajj−a²_ij ≥0 ifA is positive semidefinite, andaiiajj−a²_ij >0 ifA is positive definite.

Theorem 2.12. (Second-order necessary condition for a local minimizer)Letf :U →Rbe twice Gˆateaux differentiable on an open setU ⊆Rⁿ in the sense that there exist a vector ∇f(x) and a symmetric matrix Hf(x) such that for all h∈Rⁿ,

f(x+th) =f(x) +th∇f(x), hi+t²

2hHf(x)h, hi+o(t²). (2.1) (This condition is satisfied iff has continuous second-order partial derivatives, that is, iff ∈C².)

If x ∈ U is a local minimizer of f, then the matrix Hf(x) is positive semidefinite.

Proof. The first-order necessary condition implies ∇f(x) = 0. Since x is a local minimizer, we havef(x+th)≥f(x) if|t| is small enough. Then, (2.1) gives

t²

2hHf(x)h, hi+o(t²)≥0.

Dividing byt²and letting t→0 gives

h^THf(x)h≥0 for all h∈Rⁿ,

proving thatHf(x) is positive semidefinite. ut

We remark that the converse does not hold; see Exercise 9 on page 56.

However, we have the following theorem.

Theorem 2.13. (Second-order sufficient condition for a local minimizer)Let f :U →R beC² on an open setU ⊆Rⁿ. If x∈U is a critical point and Hf(x) is positive definite, then x is a strict local minimizer of f onU.

Proof. Define A := Hf(x). Since g(d) := hAd, di > 0 for all d on the unit sphereS :={d∈Rⁿ:kdk= 1}andS is compact, it follows that there exists α >0 such thatg(d)≥α >0 for alld∈S. Since g is homogeneous, we have g(d)≥αkdk² for alld∈Rⁿ.

Let kdk be sufficiently small. It follows from the multivariate Taylor’s formula (Corollary 1.24) and the fact∇f(x) = 0 that

f(x+d) =f(x) +h∇f(x), di+1

2hAd, di+o(kdk²)

≥f(x) +kdk² α

2 +o(kdk²) kdk²

> f(x).

This proves thatxis a strict local minimizer off. ut The positive definiteness condition on A is really needed. Exercise 9 de- scribes a problem in which a critical pointxhasHf(x) positive semidefinite, butxis actually a saddle point.

However, a global positive semidefiniteness condition onHf(x) has strong implications.

Theorem 2.14. (Second-order sufficient condition for a global minimizer) Let f : U →R be a function with positive semidefinite Hessian on an open convex set U ⊆Rⁿ. If x∈ U is a critical point, then x is a global minimizer of f onU.

Proof. Let y ∈ U. It follows from the multivariate Taylor’s formula (Theo- rem 1.23) that there exists a pointz∈(x, y) such that

f(y) =f(x) +h∇f(x), y−xi+1

2(y−x)^THf(z)(y−x).

Since∇f(x) = 0 andHf(z) is positive semidefinite, we havef(y)≥f(x) for ally∈D. Thus, xis a global minimizer off onU. ut Remark 2.15.We remark that a function with a positive semidefinite Hessian is a convex function. If the Hessian is positive definite at every point, then the function is strictly convex. In this case, the functionf has at most one critical point, which is the unique global minimizer. Chapter 4 treats convex (not necessarily differentiable) functions in detail.

2.3 Second-Order Optimality Conditions 39 Theorem 2.16. (Second-order sufficient condition for a saddle point) Let f :U →Rbe twice Gˆateaux differentiable on an open set U ⊆Rⁿ in the sense of (2.1). Ifx∈U is a critical point andHf(x) is indefinite, that is, it has at least one positive and one negative eigenvalue, thenxis a saddle point off onU.

Proof. DefineA:=Hf(x). Ifλ >0 is an eigenvalue ofAwith a corresponding eigenvectord∈Rⁿ, kdk= 1, then hAd, di=hλd, di=λ, and it follows from Corollary 1.24 that for sufficiently smallt >0,

f(x+td) =f(x) +th∇f(x), di+t²

2hAd, di+o(t²)

=f(x) +t²

2λ+o(t²)> f(x).

Similarly, ifλ <0 is an eigenvalue of A with a corresponding eigenvectord, kdk= 1, thenf(x+td)< f(x) for small enought >0. This proves thatxis

a saddle point. ut

Definition 2.17.Letf :U →Rbe aC² function on an open setU ⊆Rⁿ. A critical pointx∈U is called nondegenerateif the Hessian matrix D²f(x)is nonsingular.

A well-known result, Morse’s lemma [202], states that ifxis a nondegenerate critical point, then the Hessian Df(x₀) determines the behavior of f aroundx0. More precisely, it states that iff :U →Ris at leastC^2+k (k≥1) on an open set U ⊆Rⁿ, and if x0 ∈U is a nondegenerate critical point of f, then there exist open neighborhoods V 3 x0 and W 3 0 in Rⁿ and a one-to-one and ontoC^k mapϕ:V →W such that

f(x) =f(x0) +1

2hD²f(x0)ϕ(x), ϕ(x)i.

This is the content of Theorem 2.32 on page 49. See also Corollary 2.33.

We end this section by noting that the second-order tests considered above, and especially Morse’s lemma, give conclusive information about a critical point except when the Hessian matrix is degenerate. In these degenerate cases, nothing can be deduced about the critical point in general: it could be a local minimizer, local maximizer, or a saddle point. For example, the origin (x, y) = (0,0) is a critical point of the functionf(x, y) =x³−3xy² (the real part of the complex function (x+iy)³), with D²f(0,0) = 0. It is a saddle point, and the graph of this function is called amonkey saddle. A computer plot of the graph off will reveal that this saddle is different from the familiar horse saddle in that there is also a third depression for the tail of the monkey.

Example 2.18.Consider the family of problems

minf(x, y) :=x²+y²+βxy+x+ 2y.

We have

∇f(x, y) =

2x+βy+ 1 2y+βx+ 2

, Hf(x, y) = 2 β

β 2

. We have∇f(x, y) = 0 if and only if

2x+βy=−1, βx+ 2y=−2.

If β 6= ∓2, then the unique solution to the above equations is (x^∗, y^∗) = (2β−2, β−4)/(4−β²). Ifβ = 2, the above equations become 2x+ 2y=−1 and 2x+ 2y = −2, thus inconsistent. Similarly, if β = −2, we also have an inconsistent system of equations. Therefore, no critical points exist forβ=∓2.

The eigenvalues ofA:=Hf(x, y) can be calculated explicitly: the charac- teristic polynomial ofAis

det(A−λI) = (2−λ)²−β²= 0,

which has solutions λ = 2∓β. These are the eigenvalues of A. Thus, the eigenvalues ofAare positive for−2< β <2. In this case, the optimal solution (x^∗, y^∗) calculated above is a global minimizer of f by Theorem 2.13 and Corollary 2.20 below. In the case|β|>2, one eigenvalue ofAis positive and the other negative, so that the corresponding optimal solutionz^∗:= (x^∗, y^∗) is a saddle point by Theorem 2.16.

Finally, let us consider the behavior of f when β =∓2, when it has no critical point. Ifβ= 2, thenf(x, y) = (x+y)²+x+ 2y; thusf(x,−x) =−x andf(x,−x)→ ∓∞asx→ ±∞. Whenβ=−2,f has a similar behavior.

Dalam dokumen Foundations of Optimization (Halaman 54-57)