Theorem - Nonsmooth Analysis and Control Theory

1.12. Exercise. Letf andgbe locally Lipschitz. Suppose thatx₀ is a normal point satisfying g(x0) ≤0. Then for someK, for allβ suﬃciently near 0, the inequalityg(x) +β≤0 admits a solutionx_β satisfyingxβ−x₀ ≤Kmax{0, β}.

1.13. Exercise. Show that forn= 1,x0 = 0, the functionh(x) :=

−|x| is not normal in the sense of the word that prevails in The- orem 1.9, but that the same function g(x) := −|x| is normal as the term is employed in Exercise 1.12. Note that the conclusion of Exercise 1.12 holds while that of Theorem 1.9 fails.

The general case of mixed equality and inequality constraints is that of minimizing f(x) over the points x ∈ Rⁿ satisfying h(x) = 0, g(x) ≤ 0, where h: Rⁿ → R^m and g:Rⁿ → R^p are given functions and where a vector inequality such asg(x)≤0 is understood component-wise. We will assume that all these functions are locally Lipschitz, and that the set

x∈Rⁿ:f(x)≤r, g(x)≤s,h(x)≤t

is bounded for eachr,tinR, andsinR^p. LetP(α, β) denote the problem of minimizing f(x) subject toh(x) +α= 0, g(x) +β≤0, and letV(α, β) be the corresponding value function.

The Lagrangian of the problem is the function L: Rⁿ×R^p ×R^m → R deﬁned by

L(x, γ, ζ) :=f(x) + γ, g(x)

+ ζ, h(x)

For any pointxfeasible forP(0,0), the setM(x) ofmultipliersforxconsists of those (γ, ζ)∈R^p×R^msatisfying

γ≥0, γ, g(x)

= 0, 0∈∂_LL(·, γ, ζ)(x).

The pointxis now callednormal if γ≥0,

γ, g(x)

= 0, 0∈∂_L

γ, g(·) +

ζ, h(·)

(x) =⇒ γ= 0, ζ = 0,

and otherwise is called abnormal. We suppose thatV(0,0) <∞. In light of the preceding, the genesis of the following theorem should be clear. Its detailed proof is a substantial exercise.

2 The Mean Value Inequality 111 1.15. Corollary. Let x0 be a point satisfying h(x0) = 0, g(x0) ≤0, and suppose that x0 is normal. Then for someK >0, for all αand β near 0, there exists a pointx_α,β such that

h(x_α,β) = 0, g(x_α,β)≤0, x_α,β−x₀ ≤K

α+ p i=1

max{0, β_i}

. We remark that the value function approach illustrated above has proven to be a robust and eﬀective line of attack for many other more complex optimization problems than the ﬁnite-dimensional ones considered here, including a variety of problems in control theory.

2 The Mean Value Inequality

In this section we will prove a “multidirectional” Mean Value Theorem, a result that is novel even in the context of smooth multivariable calculus, and one that is a cornerstone of nonsmooth analysis. It is instructive to begin by recalling the statement of the classical Mean Value Theorem, which plays a fundamental role in analysis by relating the values of a function to its derivative at an intermediate point. For the purpose of comparisons to come, it is also worth noting the optimization-based proof of this familiar result. Recall that given two points x and y ∈X, the open line segment connecting them is denoted (x, y):

(x, y) :=

tx+ (1−t)y: 0< t <1 .

If t = 0 and t = 1 are allowed, we obtain the closed line segment [x, y].

Throughout this section, X is a Hilbert space.

2.1. Proposition. Supposex, y∈X and that f ∈ F is Gˆateaux diﬀeren- tiable on a neighborhood of[x, y]. Then there existsz∈(x, y)such that

f(y)−f(x) =

f_G (z), y−x

. (1)

Proof. Deﬁne ˙g: [0,1]→Rby g(t) :=f

ty+ (1−t)x

−tf(y)−(1−t)f(x).

Then g is continuous on [0,1], differentiable on (0,1), and satisfies g(0) = g(1) = 0. It follows that there exists a point ¯t ∈ (0,1) that is either a maximum or a minimum ofg on [0,1], and thus satisfiesg(¯t) = 0. We cal- culateg(¯t) by referring directly to the definition of the Gâteaux derivative at z:= ¯ty+ (1−t)x¯ =x+ ¯t(y−x), and conclude that

0 =g(¯t) =

f_G(z),(y−x)

−

f(y)−f(x) . Hence (1) holds.

A typical application of the Mean Value Theorem is to deduce monotonic- ity. For example: suppose that

f_G(z), y−x

≤0 for allz∈(x, y). Then it follows immediately thatf(y)≤f(x). In this, and in fact in most applications, it suﬃces to invoke the Mean Value Theorem in itsinequality form;

i.e., the form in which (1) is replaced by f(y)−f(x)≤

f_G (z), y−x .

Whenf is nondiﬀerentiable, the inequality form provides the appropriate model for generalization. To illustrate, consider the function f(x) :=−|x|

onR. There existsz∈(−1,1) andζ∈∂_Pf(z) such that 0 =f(1)−f(−1)≤

ζ,1−(−1)

= 2ζ,

but equality (i.e., ζ= 0) never holds. In a nonsmooth setting, it turns out as well that some tolerance must be present in the resulting inequality. For example, if f(x1, x2) :=−|x2|, thenf(−1,0) = f(1,0), but the diﬀerence (0) between these two values of f cannot be estimated by ∂_Pf(z) for any zon the line segment between (−1,0) and (1,0), since∂_Pf(z) is empty for all such z. We will need to allowz to stray a bit from the line segment.

Besides nondifferentiability, the Mean Value Inequality that we will present possesses a novel, perhaps surprising feature:multidirectionality; i.e., an estimate that is maintained uniformly over many directions. This new aspect greatly enhances the range of applications of the result, even in a smooth setting. Let us describe it in a particular two-dimensional case, in which we seek to compare the value of a Gâteaux differentiable continuous function f at (0,0) to the set of its values on the line segment

Y :=

(1, t) : 0≤t≤1 .

For eacht∈[0,1], the usual Mean Value Theorem applied to the two points (0,0) and y_t:= (1, t) asserts the existence of a pointz_t lying in the open segment connecting (0,0) andy_t such that

f(y_t)−f(0)≤

f_G(z_t), y_t . We deduce

miny∈Y

f(y)−f(0)

≤

f_G(z_t), y_t

. (2)

The new Mean Value Inequality asserts that for some z in the triangle which is the convex hull of{0} ∪Y, we have

miny∈Y

f(y)−f(0)

≤

f_G(z), y_t

∀t∈[0,1]. (3) While these conclusions appear quite similar, there is an important diﬀer- ence: (3) holds for thesamegivenz,uniformly for ally_t, in contrast to (2), where z_tdepended ony_t.

2 The Mean Value Inequality 113 To further underline the difference, consider now the case in whichY is the closed unit ballB in X. For a continuous Gâteaux differentiable function f, the new Mean Value Inequality asserts that for somez∈B, we have

inf

f−f(0)≤

f_G(z), y

∀y∈B.

Assume now that f_G ≥ 1 on B. Then, takingy = −f_G (z)/f_G(z) in the inequality leads to

inf

f−f(0)≤ −1

(this is where we exploit the uniformity of the conclusion). Applied to−f, the same reasoning yields

sup

f−f(0)≥1, whence

sup

f−inf

f ≥2.

The reader is invited to ponder how the usual Mean Value Theorem is inadequate to yield this conclusion. Further, let us note that the failure of f to exist even at a single point can make a diﬀerence and will need to be taken account of: Witnessf(x) :=x, for which we have

sup

f−inf

f = 1,

in contrast to the above, even though f(x)= 1 for allx= 0.

A Smooth Finite-Dimensional Version

The general theorem that we will present features nonsmoothness, infinite dimensions, and multidirectionality. The structure of its proof can be con- veyed much more easily in the absence of the first two factors. Accordingly, we will begin by proving a version of the theorem for a differentiable function onRⁿ. A preliminary result we will require is the following.

2.2. Proposition. Suppose Y ⊂X is closed and convex, that f ∈ F at- tains its minimum over Y aty, and that¯ f has directional derivatives in all directions aty. Then¯

f(¯y;y−y)¯ ≥0∀y∈Y. (4) In particular, iff is Gˆateaux diﬀerentiable at y, then¯

f_G (¯y);y−y¯

≥0 ∀y∈Y.

Proof. Lety∈Y, and deﬁneg: [0,1]→Rbyg(t) =f

y+t(y−y)¯ . Note that ¯y+t(y−y) =¯ ty+ (1−t)¯y belongs to Y for eacht ∈[0,1], since Y is convex. Since ¯y minimizesf over Y, we haveg(0)≤g(t) for allt. Thus by letting t ↓ 0 and using the assumption that the directional derivative f(¯y, y−y) exists, we conclude¯

0≤lim

t↓0

g(t)−g(0)

t =f(¯y;y−y).¯ Therefore (4) holds, sincey∈Y is arbitrary.

ForY ⊂X andx∈X, we denote by [x, Y] the convex hull of {x} ∪Y: [x, Y] :=

tx+ (1−t)y:t∈[0,1], y∈Y

. (5)

In particular, if Y = {y}, then [x, Y] is simply the closed line segment connectingxandy. Note that below we admit the possibility that x∈Y. 2.3. Theorem. SupposeY ⊂Rⁿ is closed, bounded, and convex. Letx∈ Rⁿ, and let f ∈ F be Gˆateaux diﬀerentiable on a neighborhood of [x, Y].

Then there existsz∈[x, Y]such that

ymin∈Yf(y)−f(x)≤

f(z), y−x

∀y∈Y. (6) Proof. Set

r:= min

y∈Yf(y)−f(x), (7)

and let U be an open neighborhood of [x, Y] on whichf is Gâteaux differentiable. Defineg: [0,1]×U →Rby

g(t, y) =f

x+t(y−x)

−rt. (8)

Since g is lower semicontinuous and [x, Y] is compact, we have r = −∞

and there exists (¯t,y)¯ ∈ [0,1]×Y at which g attains its minimum over [0,1]×Y. We will show that (6) holds withz given by

z:=x+ ¯t(¯y−x)∈[x, Y], (9) unless ¯t= 1, in which case we setz=x. The multidirectional inequality (6) will be deduced from necessary conditions for the attainment of a minimum.

Let us ﬁrst assume that ¯t∈(0,1). Then the function t→g(t,y),¯

attains its minimum over t ∈ [0,1] at t = ¯t, and thus has a vanishing derivative there. We calculate this derivative using (8) and (9) to conclude

0 = ∂

∂t

t=¯t

g(t,y) =¯

f(z),y¯−x

−r. (10)

2 The Mean Value Inequality 115 On the other hand, the function

y→g(¯t, y),

attains its minimum over y∈Y at ¯y. The necessary conditions in Propo- sition 2.2 state that

∂

∂y

y=¯y

g(¯t, y), y−y¯

≥0 ∀y∈Y. (11) It is easily calculated using (8) and (9) again that

∂

∂y

y=¯y

g(¯t, y) = ¯tf(z).

Substituting this value into (11) and dividing by ¯t >0 gives us that f(z), y−y¯

≥0∀y∈Y. (12) We now combine (10) and (12) to conclude that

r≤

f(z),y¯−x +

f(z), y−y¯

f(z), y−x ,

which holds for ally ∈Y. This statement is precisely (6) and ﬁnishes the proof of the theorem in this instance.

The case ¯t = 0 is handled more simply than above. For each (t, y) ∈ (0,1]×Y, we note from (8) and the deﬁnition of a minimum that

f(x) =g(0,y)¯ ≤g(t, y) =f

x+t(y−x)

−rt. (13) After dividing (13) byt and rearranging terms, we have for ally∈Y that

r≤lim

t↓0

1 t

x+t(y−x)

−f(x)

f(z), y−x , sincez=x, and this is the same as (6).

Finally we consider the case in which ¯t = 1. We claim g(1,y) =¯ f(x).

Indeed,g(1,y)¯ ≤g(0,y) =¯ f(x), while on the other hand, g(1,y) =¯ f(¯y)−r=f(¯y)−min

y∈Y f(y) +f(x)≥f(x).

It now follows that f(x) is the minimum value of g on [0,1]×Y. But f(x) =g(0, y) for anyy∈Y, and thus (0, y) is also a minimizer ofg. Thus if ¯t= 1, then we equally could have chosen the minimizer with ¯t= 0, and we ﬁnish by deferring to that case, treated above.

2.4. Exercise.

(a) Prove that the pointz in Theorem 2.3 also satisﬁes f(z)≤ min

w∈[x,Y]f(w) +|r|, whereris given by (7).

(b) Consider the special case of Theorem 2.3 in whichn= 2,x= (0,0), and

Y =

(1, t) : −1≤t≤1 .

Suppose that f(0,0) = 0 and that f(1, t) = 1 ∀t ∈ [−1,1].

Prove that some point (u, v) in the triangle [0, Y] satisﬁes f_x(u, v)≥1 +f_y(u, v).

(c) With xandY as in part (b), and withf(x, y) :=x−y+xy, ﬁnd all the pointszthat satisfy the conclusion of Theorem 2.3.

The General Case of the Mean Value Inequality

We now wish to extend the theorem to the case in which Y is a closed, bounded, convex subset of the Hilbert space X, and f is merely lower semicontinuous.

The first technical difficulty in proving the theorem in this greater generality is the same as that previously encountered in the proof of the Density Theorem. Namely, a lower semicontinuous function that is bounded below may not attain its minimum over a closed bounded set. We may note that the proof of Theorem 2.3 above is valid in infinite dimensions without modification if in additionY is assumed to be compact, or f is assumed to be weakly lower semicontinuous. This is because these hypotheses imply the existence of the minimizer (¯t,y). But the additional assumptions are not¯ actually required to prove the theorem in infinite dimensions, as we will see. The idea is to follow the above proof but to obtain the minimizer via a minimization principle (Theorem 1.4.1). However, there is a price to pay for this device to work effectively: The statement of the result itself needs a slight modification, a certain slackening of the inequality in the conclusion.

The result is not true without this modiﬁcation, and it is essentially required in the proof to avoid the possibility that ¯t= 1. (We may check that if ¯t= 1, no relevant information is obtained from the necessary conditions.) In the above proof, the case ¯t = 1 was handled by deferring to the case t¯= 0, but after adding the perturbation term, this is no longer possible.

The second technical diﬃculty arises in replacing derivatives by proximal subgradients. When f is nondiﬀerentiable, Proposition 2.2 is no longer available to us, and in the minimization process we have recourse instead to a smooth penalty term to replace the constraint that z lie in [x, Y].

2 The Mean Value Inequality 117 This will now allow the minimizer z to fail to lie in [x, Y], although it can be brought arbitrarily near to that set (an example will show that this tolerance is necessary). Further, the quantityrdeﬁned by (7) must be replaced by one that is more stable under such approximations, namely

ˆ r:= lim

δ↓0 inf

w∈Y+δB

f(w)−f(x)

. (14)

2.5. Exercise. If X is finite dimensional, show that the ˆr defined by (14) reduces to ther defined by (7). (Examples can be adduced to show that this may fail in infinite dimensions.)

2.6. Theorem (Mean Value Inequality). Let Y be a closed, convex, bounded subset of X, and let x ∈ domf, where f ∈ F. Suppose that rˆ deﬁned above is not−∞, and let any¯r <rˆandε >0be given. Then there exist z∈[x, Y] +εB andζ∈∂_Pf(z) such that

r <ζ, y−x ∀y∈Y.

Further, we can choose z to satisfy f(z)< inf

[x,Y]f+|¯r|+ε.

We remark that the case ˆr=∞ is not excluded; it can only arise if Y ∩ domf =∅.

Proof of the Mean Value Inequality

Proof. Without loss of generality, we assume that x = 0 and f(x) = 0.

Suppose ¯r < ˆr and ε > 0. We will ﬁrst ﬁx some constants that will be featured in the estimates to come. The hypotheses of the theorem justify the existence of such choices. We writeYfor the quantity sup

y: y∈Y . Letr∈Rand positive constantsδ,M,k be chosen so that

r < r <ˆr, |r|<|r¯|+ε, (15) δ≤ε and y∈Y +δB =⇒ f(y)≥r+δ, (16) z∈[0, Y] +δB =⇒ f(z)≥ −M, (17)

k > 1 δ²

M +|r|+ 1 + 2Y+δ

. (18)

Now we deﬁne g: [0,1]×X×X −→(−∞,∞] by g(t, y, z) :=f(z) +kty−z²−tr.

To simplify the notation somewhat, let us set S:=Y ×

[0, Y] +δB ,

and write s= (y, z) for elements inS. We can easily verify that t→ inf

s∈Sg(t, s), (19)

is continuous on [0,1], and thus attains its inﬁmum at some point ¯t∈[0,1].

The choice of a large k value forces ¯t away from 1, which is the assertion of the ﬁrst claim.

Claim 1. ¯t <1.

Proof. Lets= (y, z)∈S, and note that if y−z ≤δ, then f(z)≥r+δ by (16). A lower bound forg(1, s) can be obtained as follows:

g(1, s) =f(z) +ky−z²−r

≥

f(z) +ky−z²−r ify−z ≤δ, f(z) +kδ²−r ify−z> δ,

≥min

r+δ−r,−M +

M+|r|+ 1

−r

>0.

On the other hand, we have an upper bound of the function in (19) by settingt= 0 and noting that

s∈Sinfg(0, s)≤g(0, y,0) =f(0) +k0²= 0. (20) Thus the inﬁmum oft→inf_s∈Sg(t, s) taken overt∈[0,1] cannot occur at t= 1, and we have ¯t <1.

We now address the diﬃculty that the minimum ofs→g(¯t, s) taken over s∈S may not be attained at any point inS. We resort to a minimization principle to obtain a minimum ofg(¯t,·) perturbed by a linear function.

Note that g(¯t,·)∈ F and is bounded below on the closed bounded setS.

By Theorem 1.4.1, for each smallη >0, there existsξ:= (ξ_y, ξ_z)∈X×X and ¯s:= (¯y,z)¯ ∈S so thatξ< ηand the function

s→g(¯t, s) +ξ, s, (21)

attains its minimum over s∈ S at ¯s. Of course ξ and ¯s depend onη, as does the choice ofζappearing in the conclusions of the theorem, the latter being deﬁned as

ζ:= 2k(¯t¯y−z)¯ −ξ_z.

We will see that the conclusions of the theorem hold for these choices ofζ and ¯z, provided thatη is chosen suﬃciently small.

Claim 2. Ifη is small enough, we havez¯∈[0, Y] +δB.

2 The Mean Value Inequality 119 Proof. Suppose on the contrary that d_[0,Y_](¯z)≥δ. Observe ﬁrst that

ξ, s≤η

2Y+δ)∀s∈S. (22)

Then since ¯t¯y∈[0, Y], we have by (22) and (17) that g(¯t,¯s) +ξ,s¯ =f(¯z) +k¯ty¯−z¯ ²−r¯t+ξ,s¯

≥ −M+kδ²− |r| −η

2Y+δ ,

which by the choice ofkin (18) can be made greater than 1 ifηis suﬃciently small. On the other hand, recall that ¯sachieves the inﬁmum in the function (21). Therefore if η≤

2Y+δ)⁻¹/2, we have by (20) and (22) g(¯t,s) +¯ ξ,s ≤¯ g(0,y,¯ 0) +ξ,s¯

≤η

2y¯ +δ

≤1/2.

Thus we conclude ¯z∈[0, Y] +δBas claimed.

Claim 3. ζ∈∂_Pf(¯z).

Proof. Sincez→f(z) +k¯t¯y−z²+ξz, zhas a minimum over [0, Y] +δB atz= ¯z, and ¯zis a point in the interior of this set (Claim 2), we have that

ζ=−

k¯ty¯−(·)²+

ξ_z,(·)

(¯z)∈∂_Pf(¯z).

We next turn to proving the upper bound on f(¯z).

Claim 4. The estimatef(¯z)≤inf_z∈[0,Y_]f(z) +|r|+εholds, providedη is small enough.

Proof. Using (22), we calculate

f(¯z)−¯tr≤f(¯z) +k¯t¯y−z¯ ²−¯tr+

ξ,(¯y,z)¯ +η

2Y+δ

= inf

s∈S

g(¯t, s) +ξ, s +η

2Y+δ

≤ inf

(y,z)∈S

f(z) +kty−z²−tr+

ξ,(y, z) +η

2Y+δ

≤ inf

z∈[0,Y]f(z) + max{0,−r}+ 2η

2Y+δ .

The final inequality is justified since the last infimum is only overz∈[0, Y], and included in the next to last infimum is the case whenty=z∈[0, Y], making it therefore an infimum over a larger set. Hence

f(¯z)≤ inf

z∈[0,Y]f(z) + max{¯t,1−¯t}|r|+ 2η

2Y+δ

≤ inf

z∈[0,Y]f(z) +|¯r|+ε,

providedη is suﬃciently small, where the second relation in (15) has been used.

Claim 5. Suppose t¯ = 0, and 0 < η <

(r−¯r)/6(2Y +δ)2

. Then

r <ζ, y for ally∈Y.

Proof. In the present case with ¯t = 0, the deﬁnition of ζ reduces to ζ =

−2kz¯−ξ_z. Let us note (using (22)) that

s∈Sinf g(0, s)≥ inf

s∈S

g(0, s) +ξ, s

−η

2Y+δ

=f(¯z) +k¯z²+

ξ,(¯y,¯z)

−η

2Y+δ

≥f(¯z) +k¯z²−2η

2Y+δ .

(23)

Now we lett:= min√

η,(r−r)/3k(Y¯ ²+δ)

>0. We have for anyy∈Y that

s∈Sinf g(t, s)≤f(¯z) +kty−¯z²−tr. (24) Since ¯t= 0 by deﬁnition provides a minimum of (19), it follows from (23) and (24) that for ally∈Y

0≤ inf

s∈Sg(t, s)−inf

s∈Sg(0, s)

≤kty−¯z²−tr−kz¯²+ 2η

2Y+δ

−2k¯z, y

−r

+t²ky²+ 2η

2Y+δ

=t ζ, y

−r

+t²ky²+tξz, y+ 2η

2Y+δ

≤t ζ, y

−r

+t²kY²+tηY+ 2η

2Y+δ . Dividing through byt and using our choices oftandη, we obtain

ζ, y ≥r−tkY²−ηY − 2η t

2Y+δ

>r,¯

which concludes the proof of this claim.

To handle the situation where ¯t >0, we ﬁrst prove an estimate regarding

¯ y.

Claim 6. Ifη is suﬃciently small, thenζ,y¯ >(r+ ¯r)/2.

Proof. In a manner similar to the beginning of the proof of Claim 5, we have

s∈Sinf g(¯t, s)≥f(¯z) +k¯ty¯−z¯²−2η

2Y+δ

, (25)

and

s∈Sinf g(1, s)≤f(¯z) +k(1−¯t)¯y+ (¯t¯y−z)¯ ²−r. (26)

2 The Mean Value Inequality 121 Recall that ¯t provides an inﬁmum of the function in (21). By subtracting (25) from (26), we obtain

0≤inf

s∈Sg(1, s)−inf

s∈Sg(¯t, s)

≤k(1−¯t)¯y+ (¯t¯y−z)¯ ²−r−k¯ty¯−z¯ ²+ 2η

2Y+δ

= (1−¯t)

2k(¯ty¯−z),¯ y¯

−r + 2η

2Y+δ . This in turn implies

2k(¯t¯y−z),¯ y¯

≥r− 2η 1−¯t

2Y+δ . Consequently we have

ζ,y¯=

2k(¯ty¯−z)¯ −ξ_z,y¯

≥

2k(¯ty¯−z),¯ y¯

−ηY

≥r−η 2

1−¯t

2Y+δ +Y

. Hence ζ,y¯ >(r+ ¯r)/2 ifη is chosen small enough.

The proof of the theorem will ensue from the next claim.

Claim 7. If in addition to the requirements onηin all of the earlier claims, we also haveη <t(r¯ −r)/¯

8Y , then

r <ζ, y ∀y∈Y.

Proof. We have that the function

y→k¯ty−¯z²+ξ_y, y,

achieves its inﬁmum over y ∈Y at y = ¯y. SinceY is convex, we have by Proposition 2.2 that for ally∈Y,

ζ_y

¯t , y−y¯

≥ −2η

¯tY. Therefore

ζ, y−y¯ =

2k(¯t¯y−z)¯ −ξ_z, y−y¯

≥ −2η

¯tY −2ηY

≥ −4η

¯tY ≥ r¯−r 2 .

Finally, from this and Claim 6 we conclude that for anyy∈Y that ζ, y ≥ ζ,y −¯ r−r¯

> r+ ¯r

2 −r−¯r 2

= ¯r, which completes the proof of the theorem.

Dalam dokumen Nonsmooth Analysis and Control Theory (Halaman 123-135)