Exercise - Nonsmooth Analysis and Control Theory

2 Proximal Subgradients 33 second-order Taylor expansion with remainder, which means there exists a neighborhoodB(x;η) ofxso that for everyy∈B(x;η) we have

f(y) =f(x) +

f(x), y−x +¹₂

f(z)(y−x), y−x ,

where zis some element on the line segment connectingxandy. We note that if the norms of f(y) are bounded over y ∈ B(x;η) by the constant 2σ >0, then this implies

f(y)≥f(x) +

f(x), y−x

−σy−x² (3) for ally∈B(x;η).

If it should also happen thatf:X→ L(X, X) is continuous onU, thenf is said to be twice continuously diﬀerentiable onU, and we writef ∈C²(U), or simplyf ∈C²ifU =X. We note that iff ∈C²(U), then for eachx∈U there exists a neighborhoodB(x;η) and a constantσso that (3) holds, since the continuity of f at x implies that the norms off are bounded in a neighborhood ofx.

for ally∈B(x;η) and for allα≥f(y). This in turn implies (ζ,−1),

(y, α)−

x, f(x)

≤σ(y, α)−x, f(x)² for all points (y, α)∈epi(f) near

x, f(x)

. In view of Proposition 1.5, this implies that (ζ,−1)∈N_epi^P _f

x, f(x) .

Let us now turn to the “only if” part. To this end, suppose that (ζ,−1)∈ N_epi^P _f

x, f(x)

. Then by Proposition 1.3 there existsδ >0 such that x, f(x)

∈proj_epi_f

x, f(x)

+δ(ζ,−1) . This evidently implies

δ(ζ,−1)²≤x, f(x)

+δ(ζ,−1)

−(y, α)²

for all (y, α) ∈ epif; see Figure 1.4. Upon taking α = f(y), the last inequality yields

δ²ζ²+δ²≤ x−y+δζ²+

f(x)−f(y)−δ2

, which can be rewritten as

f(y)−f(x) +δ2

≥δ²+ 2δζ, y−x − x−y². (5) It is clear that the right-hand side of (5) is positive for all y suﬃciently near x, say fory ∈B(x;η). By shrinking η > 0 if necessary, we can also ensure (by the lower semicontinuity off) thaty∈B(x;η) implies

f(y)−f(x) +δ >0.

Hence taking square roots of (5) gives us that f(y)≥g(y) :=f(x)−δ+

δ²+ 2δζ, y−x − x−y²_1/2 (6) for ally∈B(x;η). Direct calculations show thatg(x) =ζand thatgex- ists and is bounded, say by 2σ >0, on a neighborhood ofx(Exercise 2.4).

Again if η is shrunk further if necessary, we have (as noted above in con- nection with the inequality (3))

g(y)≥g(x) +ζ, y−x −σy−x² ∀y∈B(x;η).

But then by (6), and since f(x) =g(x), we see that

f(y)≥f(x) +ζ, y−x −σy−x²∀y∈B(x;η), which is (4) as required.

2 Proximal Subgradients 35

FIGURE 1.4.ζbelongs to∂Pf(x).

The deﬁnition of proximal subgradients via proximal normals to an epigraph is a geometric approach, and the characterization in Theorem 2.5 can also be interpreted geometrically. The proximal subgradient inequality (4) asserts that near x,f(·) majorizes the quadratic function

h(y) :=f(x) +ζ, y−x −σy−x²,

with equality aty=x(since obviouslyh(x) =f(x)). It is worth noting that this is equivalent to saying that y→f(y)−h(y) has a local minimum at y=xwith min value equal to 0. Put into purely heuristic terms, the content of Theorem 2.5 is that the existence of such a parabola hwhich “locally ﬁts under” the epigraph off at

x, f(x)

is equivalent to the existence of a ball in X×Rtouching the epigraph nonhorizontally at that point; this is, in essence, what the proof of the theorem shows. See Figure 1.4.

The description of proximal subgradients contained in Theorem 2.5 is gen- erally more useful in analyzing lower semicontinuous functions than is a direct appeal to the definition. The first corollary below illustrates this, and relates ∂_Pf to classical differentiability. It also states that for convex functions, the inequality (4) holds globally in an even simpler form; this is the functional analogue of the simplified proximal normal inequality for convex sets (Proposition 1.10).

2.6. Corollary. Let f ∈ F andU ⊂X be open.

(a) Assume thatf is Gˆateaux diﬀerentiable at x∈U. Then

∂_Pf(x)⊆ f_G(x)

(b) Iff ∈C²(U), then

∂_Pf(x) = f(x) for allx∈U.

f(y)≥f(x) +ζ, y−x ∀y∈X. (7) Proof.

(a) Suppose f has a Gˆateaux derivative atxand that ζ ∈∂_Pf(x). For anyv∈X, if we writey=x+tv, the proximal subgradient inequality (4) implies that there existsσ >0 such that

f(x+tv)−f(x)

t − ζ, v ≥ −tσv²

for all suﬃciently small positivet. Upon lettingt↓0 we obtain f_G (x)−ζ, v

≥0.

Sincev was arbitrary, the conclusionζ=f_G (x) follows.

(b) If f ∈ C²(U) and x ∈ U, then we have f(x) ∈ ∂_Pf(x) by Theo- rem 2.5, since (3) implies (4) ifζ is set equal tof(x). That∂_Pf(x) contains onlyf(x) follows from part (a).

(c) Obviously ifζ satisﬁes (7), then (4) holds withσ= 0 and anyη >0, so that ζ ∈ ∂_Pf(x). Conversely, suppose ζ ∈ ∂_Pf(x), and σ and η are chosen as in (4). Lety ∈X. Then for anyt in (0,1) suﬃciently small so that (1−t)x+ty∈B(x;η), we have by the convexity of f and (4) (where we substitute (1−t)x+ty fory) that

(1−t)f(x) +tf(y)≥f

(1−t)x+ty

≥f(x) +tζ, y−x −t²σy−x². Simplifying and dividing byt, we conclude

f(y)≥f(x) +ζ, y−x −tσy−x². Lettingt↓0 yields (7).

The containment in Corollary 2.6(a) is the best possible conclusion under the stated assumptions, since even when X = R and f is continuously diﬀerentiable, the nonemptiness of the proximal subdiﬀerential is not as- sured. The already familiarC¹functionf(x) =−|x|^3/2admits no proximal subgradient at x= 0 (see Exercise 1.7).

2 Proximal Subgradients 37 The ﬁrst part of the following corollary has already been observed (Exer- cise 2.3). Despite its simplicity, it is the fundamental fact that generates proximal subgradients on many occasions. The second part says that the

“ﬁrst-order” necessary condition for a minimum is also suﬃcient in the case of convex functions, which is a principal reason for their importance.

2.7. Corollary. Supposef ∈ F.

(a) Iff has a local minimum atx, then0∈∂_Pf(x).

(b) Conversely, iff is convex and0∈∂_Pf(x), thenxis a global minimum off.

Proof.

(a) The deﬁnition of a local minimum says there existsη >0 so that f(y)≥f(x)∀y∈B(x;η),

which is the proximal subgradient inequality withζ = 0 andσ= 0.

Thus Theorem 2.5 implies that 0∈∂_Pf(x).

(b) Under these hypotheses, (7) holds withζ= 0. Thusf(y)≥f(x) for ally∈X, which says thatxis a global minimum off.

The proximal subdifferential is a “one-sided” object suitable to the analysis of lower semicontinuous functions. For a theory applicable to upper semicontinuous functions f, the proximal superdifferential ∂^Pf(x) is the appropriate object, and can be defined simply as−∂P(−f)(x). In the sub- sequent development, analogues for upper semicontinuous functions will usually not be stated because they require only evident modifications, such as replacing “sub” by “super,” “≤” by “≥,” “minimum” by “maximum,”

and “convex” by “concave.” Nonetheless, we will have occasional use for supergradients.

Dalam dokumen Nonsmooth Analysis and Control Theory (Halaman 46-50)