• Tidak ada hasil yang ditemukan

2 Proximal Subgradients 33 second-order Taylor expansion with remainder, which means there exists a neighborhoodB(x;η) ofxso that for everyy∈B(x;η) we have

f(y) =f(x) +

f(x), y−x +12

f(z)(y−x), y−x ,

where zis some element on the line segment connectingxandy. We note that if the norms of f(y) are bounded over y B(x;η) by the constant 2σ >0, then this implies

f(y)≥f(x) +

f(x), y−x

−σy−x2 (3) for ally∈B(x;η).

If it should also happen thatf:X→ L(X, X) is continuous onU, thenf is said to be twice continuously differentiable onU, and we writef ∈C2(U), or simplyf ∈C2ifU =X. We note that iff ∈C2(U), then for eachx∈U there exists a neighborhoodB(x;η) and a constantσso that (3) holds, since the continuity of f at x implies that the norms off are bounded in a neighborhood ofx.

for ally∈B(x;η) and for allα≥f(y). This in turn implies (ζ,−1),

(y, α)

x, f(x)

≤σ(y, α)−x, f(x)2 for all points (y, α)epi(f) near

x, f(x)

. In view of Proposition 1.5, this implies that (ζ,−1)∈NepiP f

x, f(x) .

Let us now turn to the “only if” part. To this end, suppose that (ζ,−1) NepiP f

x, f(x)

. Then by Proposition 1.3 there existsδ >0 such that x, f(x)

projepif

x, f(x)

+δ(ζ,−1) . This evidently implies

δ(ζ,−1)2≤x, f(x)

+δ(ζ,−1)

(y, α)2

for all (y, α) epif; see Figure 1.4. Upon taking α = f(y), the last in- equality yields

δ2ζ2+δ2≤ x−y+δζ2+

f(x)−f(y)−δ2

, which can be rewritten as

f(y)−f(x) +δ2

≥δ2+ 2δζ, y−x − x−y2. (5) It is clear that the right-hand side of (5) is positive for all y sufficiently near x, say fory ∈B(x;η). By shrinking η > 0 if necessary, we can also ensure (by the lower semicontinuity off) thaty∈B(x;η) implies

f(y)−f(x) +δ >0.

Hence taking square roots of (5) gives us that f(y)≥g(y) :=f(x)−δ+

δ2+ 2δζ, y−x − x−y21/2 (6) for ally∈B(x;η). Direct calculations show thatg(x) =ζand thatgex- ists and is bounded, say by 2σ >0, on a neighborhood ofx(Exercise 2.4).

Again if η is shrunk further if necessary, we have (as noted above in con- nection with the inequality (3))

g(y)≥g(x) +ζ, y−x −σy−x2 ∀y∈B(x;η).

But then by (6), and since f(x) =g(x), we see that

f(y)≥f(x) +ζ, y−x −σy−x2∀y∈B(x;η), which is (4) as required.

2 Proximal Subgradients 35

FIGURE 1.4.ζbelongs toPf(x).

The definition of proximal subgradients via proximal normals to an epi- graph is a geometric approach, and the characterization in Theorem 2.5 can also be interpreted geometrically. The proximal subgradient inequality (4) asserts that near x,f(·) majorizes the quadratic function

h(y) :=f(x) +ζ, y−x −σy−x2,

with equality aty=x(since obviouslyh(x) =f(x)). It is worth noting that this is equivalent to saying that y→f(y)−h(y) has a local minimum at y=xwith min value equal to 0. Put into purely heuristic terms, the content of Theorem 2.5 is that the existence of such a parabola hwhich “locally fits under” the epigraph off at

x, f(x)

is equivalent to the existence of a ball in Rtouching the epigraph nonhorizontally at that point; this is, in essence, what the proof of the theorem shows. See Figure 1.4.

The description of proximal subgradients contained in Theorem 2.5 is gen- erally more useful in analyzing lower semicontinuous functions than is a direct appeal to the definition. The first corollary below illustrates this, and relates Pf to classical differentiability. It also states that for convex functions, the inequality (4) holds globally in an even simpler form; this is the functional analogue of the simplified proximal normal inequality for convex sets (Proposition 1.10).

2.6. Corollary. Let f ∈ F andU ⊂X be open.

(a) Assume thatf is Gˆateaux differentiable at x∈U. Then

Pf(x) fG(x)

.

(b) Iff ∈C2(U), then

Pf(x) = f(x) for allx∈U.

(c) Iff is convex, thenζ∈∂Pf(x)iff

f(y)≥f(x) +ζ, y−x ∀y∈X. (7) Proof.

(a) Suppose f has a Gˆateaux derivative atxand that ζ ∈∂Pf(x). For anyv∈X, if we writey=x+tv, the proximal subgradient inequality (4) implies that there existsσ >0 such that

f(x+tv)−f(x)

t − ζ, v ≥ −tσv2

for all sufficiently small positivet. Upon lettingt↓0 we obtain fG (x)−ζ, v

0.

Sincev was arbitrary, the conclusionζ=fG (x) follows.

(b) If f C2(U) and x U, then we have f(x) Pf(x) by Theo- rem 2.5, since (3) implies (4) ifζ is set equal tof(x). ThatPf(x) contains onlyf(x) follows from part (a).

(c) Obviously ifζ satisfies (7), then (4) holds withσ= 0 and anyη >0, so that ζ Pf(x). Conversely, suppose ζ Pf(x), and σ and η are chosen as in (4). Lety ∈X. Then for anyt in (0,1) sufficiently small so that (1−t)x+ty∈B(x;η), we have by the convexity of f and (4) (where we substitute (1−t)x+ty fory) that

(1−t)f(x) +tf(y)≥f

(1−t)x+ty

≥f(x) +tζ, y−x −t2σy−x2. Simplifying and dividing byt, we conclude

f(y)≥f(x) +ζ, y−x −tσy−x2. Lettingt↓0 yields (7).

The containment in Corollary 2.6(a) is the best possible conclusion under the stated assumptions, since even when X = R and f is continuously differentiable, the nonemptiness of the proximal subdifferential is not as- sured. The already familiarC1functionf(x) =−|x|3/2admits no proximal subgradient at x= 0 (see Exercise 1.7).

2 Proximal Subgradients 37 The first part of the following corollary has already been observed (Exer- cise 2.3). Despite its simplicity, it is the fundamental fact that generates proximal subgradients on many occasions. The second part says that the

“first-order” necessary condition for a minimum is also sufficient in the case of convex functions, which is a principal reason for their importance.

2.7. Corollary. Supposef ∈ F.

(a) Iff has a local minimum atx, then0∈∂Pf(x).

(b) Conversely, iff is convex and0∈∂Pf(x), thenxis a global minimum off.

Proof.

(a) The definition of a local minimum says there existsη >0 so that f(y)≥f(x)∀y∈B(x;η),

which is the proximal subgradient inequality withζ = 0 andσ= 0.

Thus Theorem 2.5 implies that 0∈∂Pf(x).

(b) Under these hypotheses, (7) holds withζ= 0. Thusf(y)≥f(x) for ally∈X, which says thatxis a global minimum off.

The proximal subdifferential is a “one-sided” object suitable to the anal- ysis of lower semicontinuous functions. For a theory applicable to upper semicontinuous functions f, the proximal superdifferential Pf(x) is the appropriate object, and can be defined simply as−∂P(−f)(x). In the sub- sequent development, analogues for upper semicontinuous functions will usually not be stated because they require only evident modifications, such as replacing “sub” by “super,” “” by “,” “minimum” by “maximum,”

and “convex” by “concave.” Nonetheless, we will have occasional use for supergradients.

Dalam dokumen Nonsmooth Analysis and Control Theory (Halaman 46-50)

Dokumen terkait