1.12. Exercise. Letf andgbe locally Lipschitz. Suppose thatx0 is a normal point satisfying g(x0) ≤0. Then for someK, for allβ sufficiently near 0, the inequalityg(x) +β≤0 admits a solutionxβ satisfyingxβ−x0 ≤Kmax{0, β}.
1.13. Exercise. Show that forn= 1,x0 = 0, the functionh(x) :=
−|x| is not normal in the sense of the word that prevails in The- orem 1.9, but that the same function g(x) := −|x| is normal as the term is employed in Exercise 1.12. Note that the conclusion of Exercise 1.12 holds while that of Theorem 1.9 fails.
The general case of mixed equality and inequality constraints is that of minimizing f(x) over the points x ∈ Rn satisfying h(x) = 0, g(x) ≤ 0, where h: Rn → Rm and g:Rn → Rp are given functions and where a vector inequality such asg(x)≤0 is understood component-wise. We will assume that all these functions are locally Lipschitz, and that the set
x∈Rn:f(x)≤r, g(x)≤s,h(x)≤t
is bounded for eachr,tinR, andsinRp. LetP(α, β) denote the problem of minimizing f(x) subject toh(x) +α= 0, g(x) +β≤0, and letV(α, β) be the corresponding value function.
The Lagrangian of the problem is the function L: Rn×Rp ×Rm → R defined by
L(x, γ, ζ) :=f(x) + γ, g(x)
+ ζ, h(x)
.
For any pointxfeasible forP(0,0), the setM(x) ofmultipliersforxconsists of those (γ, ζ)∈Rp×Rmsatisfying
γ≥0, γ, g(x)
= 0, 0∈∂LL(·, γ, ζ)(x).
The pointxis now callednormal if γ≥0,
γ, g(x)
= 0, 0∈∂L
γ, g(·) +
ζ, h(·)
(x) =⇒ γ= 0, ζ = 0,
and otherwise is called abnormal. We suppose thatV(0,0) <∞. In light of the preceding, the genesis of the following theorem should be clear. Its detailed proof is a substantial exercise.
2 The Mean Value Inequality 111 1.15. Corollary. Let x0 be a point satisfying h(x0) = 0, g(x0) ≤0, and suppose that x0 is normal. Then for someK >0, for all αand β near 0, there exists a pointxα,β such that
h(xα,β) = 0, g(xα,β)≤0, xα,β−x0 ≤K
α+ p i=1
max{0, βi}
. We remark that the value function approach illustrated above has proven to be a robust and effective line of attack for many other more complex optimization problems than the finite-dimensional ones considered here, including a variety of problems in control theory.
2 The Mean Value Inequality
In this section we will prove a “multidirectional” Mean Value Theorem, a result that is novel even in the context of smooth multivariable calculus, and one that is a cornerstone of nonsmooth analysis. It is instructive to begin by recalling the statement of the classical Mean Value Theorem, which plays a fundamental role in analysis by relating the values of a function to its derivative at an intermediate point. For the purpose of comparisons to come, it is also worth noting the optimization-based proof of this familiar result. Recall that given two points x and y ∈X, the open line segment connecting them is denoted (x, y):
(x, y) :=
tx+ (1−t)y: 0< t <1 .
If t = 0 and t = 1 are allowed, we obtain the closed line segment [x, y].
Throughout this section, X is a Hilbert space.
2.1. Proposition. Supposex, y∈X and that f ∈ F is Gˆateaux differen- tiable on a neighborhood of[x, y]. Then there existsz∈(x, y)such that
f(y)−f(x) =
fG (z), y−x
. (1)
Proof. Define ˙g: [0,1]→Rby g(t) :=f
ty+ (1−t)x
−tf(y)−(1−t)f(x).
Then g is continuous on [0,1], differentiable on (0,1), and satisfies g(0) = g(1) = 0. It follows that there exists a point ¯t ∈ (0,1) that is either a maximum or a minimum ofg on [0,1], and thus satisfiesg(¯t) = 0. We cal- culateg(¯t) by referring directly to the definition of the Gˆateaux derivative at z:= ¯ty+ (1−t)x¯ =x+ ¯t(y−x), and conclude that
0 =g(¯t) =
fG(z),(y−x)
−
f(y)−f(x) . Hence (1) holds.
A typical application of the Mean Value Theorem is to deduce monotonic- ity. For example: suppose that
fG(z), y−x
≤0 for allz∈(x, y). Then it follows immediately thatf(y)≤f(x). In this, and in fact in most applica- tions, it suffices to invoke the Mean Value Theorem in itsinequality form;
i.e., the form in which (1) is replaced by f(y)−f(x)≤
fG (z), y−x .
Whenf is nondifferentiable, the inequality form provides the appropriate model for generalization. To illustrate, consider the function f(x) :=−|x|
onR. There existsz∈(−1,1) andζ∈∂Pf(z) such that 0 =f(1)−f(−1)≤
ζ,1−(−1)
= 2ζ,
but equality (i.e., ζ= 0) never holds. In a nonsmooth setting, it turns out as well that some tolerance must be present in the resulting inequality. For example, if f(x1, x2) :=−|x2|, thenf(−1,0) = f(1,0), but the difference (0) between these two values of f cannot be estimated by ∂Pf(z) for any zon the line segment between (−1,0) and (1,0), since∂Pf(z) is empty for all such z. We will need to allowz to stray a bit from the line segment.
Besides nondifferentiability, the Mean Value Inequality that we will present possesses a novel, perhaps surprising feature:multidirectionality; i.e., an es- timate that is maintained uniformly over many directions. This new aspect greatly enhances the range of applications of the result, even in a smooth setting. Let us describe it in a particular two-dimensional case, in which we seek to compare the value of a Gˆateaux differentiable continuous function f at (0,0) to the set of its values on the line segment
Y :=
(1, t) : 0≤t≤1 .
For eacht∈[0,1], the usual Mean Value Theorem applied to the two points (0,0) and yt:= (1, t) asserts the existence of a pointzt lying in the open segment connecting (0,0) andyt such that
f(yt)−f(0)≤
fG(zt), yt . We deduce
miny∈Y
f(y)−f(0)
≤
fG(zt), yt
. (2)
The new Mean Value Inequality asserts that for some z in the triangle which is the convex hull of{0} ∪Y, we have
miny∈Y
f(y)−f(0)
≤
fG(z), yt
∀t∈[0,1]. (3) While these conclusions appear quite similar, there is an important differ- ence: (3) holds for thesamegivenz,uniformly for allyt, in contrast to (2), where ztdepended onyt.
2 The Mean Value Inequality 113 To further underline the difference, consider now the case in whichY is the closed unit ballB in X. For a continuous Gˆateaux differentiable function f, the new Mean Value Inequality asserts that for somez∈B, we have
inf
B
f−f(0)≤
fG(z), y
∀y∈B.
Assume now that fG ≥ 1 on B. Then, takingy = −fG (z)/fG(z) in the inequality leads to
inf
B
f−f(0)≤ −1
(this is where we exploit the uniformity of the conclusion). Applied to−f, the same reasoning yields
sup
B
f−f(0)≥1, whence
sup
B
f−inf
B
f ≥2.
The reader is invited to ponder how the usual Mean Value Theorem is inadequate to yield this conclusion. Further, let us note that the failure of f to exist even at a single point can make a difference and will need to be taken account of: Witnessf(x) :=x, for which we have
sup
B
f−inf
B
f = 1,
in contrast to the above, even though f(x)= 1 for allx= 0.
A Smooth Finite-Dimensional Version
The general theorem that we will present features nonsmoothness, infinite dimensions, and multidirectionality. The structure of its proof can be con- veyed much more easily in the absence of the first two factors. Accordingly, we will begin by proving a version of the theorem for a differentiable func- tion onRn. A preliminary result we will require is the following.
2.2. Proposition. Suppose Y ⊂X is closed and convex, that f ∈ F at- tains its minimum over Y aty, and that¯ f has directional derivatives in all directions aty. Then¯
f(¯y;y−y)¯ ≥0∀y∈Y. (4) In particular, iff is Gˆateaux differentiable at y, then¯
fG (¯y);y−y¯
≥0 ∀y∈Y.
Proof. Lety∈Y, and defineg: [0,1]→Rbyg(t) =f
¯
y+t(y−y)¯ . Note that ¯y+t(y−y) =¯ ty+ (1−t)¯y belongs to Y for eacht ∈[0,1], since Y is convex. Since ¯y minimizesf over Y, we haveg(0)≤g(t) for allt. Thus by letting t ↓ 0 and using the assumption that the directional derivative f(¯y, y−y) exists, we conclude¯
0≤lim
t↓0
g(t)−g(0)
t =f(¯y;y−y).¯ Therefore (4) holds, sincey∈Y is arbitrary.
ForY ⊂X andx∈X, we denote by [x, Y] the convex hull of {x} ∪Y: [x, Y] :=
tx+ (1−t)y:t∈[0,1], y∈Y
. (5)
In particular, if Y = {y}, then [x, Y] is simply the closed line segment connectingxandy. Note that below we admit the possibility that x∈Y. 2.3. Theorem. SupposeY ⊂Rn is closed, bounded, and convex. Letx∈ Rn, and let f ∈ F be Gˆateaux differentiable on a neighborhood of [x, Y].
Then there existsz∈[x, Y]such that
ymin∈Yf(y)−f(x)≤
f(z), y−x
∀y∈Y. (6) Proof. Set
r:= min
y∈Yf(y)−f(x), (7)
and let U be an open neighborhood of [x, Y] on whichf is Gˆateaux differ- entiable. Defineg: [0,1]×U →Rby
g(t, y) =f
x+t(y−x)
−rt. (8)
Since g is lower semicontinuous and [x, Y] is compact, we have r = −∞
and there exists (¯t,y)¯ ∈ [0,1]×Y at which g attains its minimum over [0,1]×Y. We will show that (6) holds withz given by
z:=x+ ¯t(¯y−x)∈[x, Y], (9) unless ¯t= 1, in which case we setz=x. The multidirectional inequality (6) will be deduced from necessary conditions for the attainment of a minimum.
Let us first assume that ¯t∈(0,1). Then the function t→g(t,y),¯
attains its minimum over t ∈ [0,1] at t = ¯t, and thus has a vanishing derivative there. We calculate this derivative using (8) and (9) to conclude
0 = ∂
∂t
t=¯t
g(t,y) =¯
f(z),y¯−x
−r. (10)
2 The Mean Value Inequality 115 On the other hand, the function
y→g(¯t, y),
attains its minimum over y∈Y at ¯y. The necessary conditions in Propo- sition 2.2 state that
∂
∂y
y=¯y
g(¯t, y), y−y¯
≥0 ∀y∈Y. (11) It is easily calculated using (8) and (9) again that
∂
∂y
y=¯y
g(¯t, y) = ¯tf(z).
Substituting this value into (11) and dividing by ¯t >0 gives us that f(z), y−y¯
≥0∀y∈Y. (12) We now combine (10) and (12) to conclude that
r≤
f(z),y¯−x +
f(z), y−y¯
=
f(z), y−x ,
which holds for ally ∈Y. This statement is precisely (6) and finishes the proof of the theorem in this instance.
The case ¯t = 0 is handled more simply than above. For each (t, y) ∈ (0,1]×Y, we note from (8) and the definition of a minimum that
f(x) =g(0,y)¯ ≤g(t, y) =f
x+t(y−x)
−rt. (13) After dividing (13) byt and rearranging terms, we have for ally∈Y that
r≤lim
t↓0
1 t
f
x+t(y−x)
−f(x)
=
f(z), y−x , sincez=x, and this is the same as (6).
Finally we consider the case in which ¯t = 1. We claim g(1,y) =¯ f(x).
Indeed,g(1,y)¯ ≤g(0,y) =¯ f(x), while on the other hand, g(1,y) =¯ f(¯y)−r=f(¯y)−min
y∈Y f(y) +f(x)≥f(x).
It now follows that f(x) is the minimum value of g on [0,1]×Y. But f(x) =g(0, y) for anyy∈Y, and thus (0, y) is also a minimizer ofg. Thus if ¯t= 1, then we equally could have chosen the minimizer with ¯t= 0, and we finish by deferring to that case, treated above.
2.4. Exercise.
(a) Prove that the pointz in Theorem 2.3 also satisfies f(z)≤ min
w∈[x,Y]f(w) +|r|, whereris given by (7).
(b) Consider the special case of Theorem 2.3 in whichn= 2,x= (0,0), and
Y =
(1, t) : −1≤t≤1 .
Suppose that f(0,0) = 0 and that f(1, t) = 1 ∀t ∈ [−1,1].
Prove that some point (u, v) in the triangle [0, Y] satisfies fx(u, v)≥1 +fy(u, v).
(c) With xandY as in part (b), and withf(x, y) :=x−y+xy, find all the pointszthat satisfy the conclusion of Theorem 2.3.
The General Case of the Mean Value Inequality
We now wish to extend the theorem to the case in which Y is a closed, bounded, convex subset of the Hilbert space X, and f is merely lower semicontinuous.
The first technical difficulty in proving the theorem in this greater general- ity is the same as that previously encountered in the proof of the Density Theorem. Namely, a lower semicontinuous function that is bounded below may not attain its minimum over a closed bounded set. We may note that the proof of Theorem 2.3 above is valid in infinite dimensions without mod- ification if in additionY is assumed to be compact, or f is assumed to be weakly lower semicontinuous. This is because these hypotheses imply the existence of the minimizer (¯t,y). But the additional assumptions are not¯ actually required to prove the theorem in infinite dimensions, as we will see. The idea is to follow the above proof but to obtain the minimizer via a minimization principle (Theorem 1.4.1). However, there is a price to pay for this device to work effectively: The statement of the result itself needs a slight modification, a certain slackening of the inequality in the conclusion.
The result is not true without this modification, and it is essentially re- quired in the proof to avoid the possibility that ¯t= 1. (We may check that if ¯t= 1, no relevant information is obtained from the necessary conditions.) In the above proof, the case ¯t = 1 was handled by deferring to the case t¯= 0, but after adding the perturbation term, this is no longer possible.
The second technical difficulty arises in replacing derivatives by proximal subgradients. When f is nondifferentiable, Proposition 2.2 is no longer available to us, and in the minimization process we have recourse instead to a smooth penalty term to replace the constraint that z lie in [x, Y].
2 The Mean Value Inequality 117 This will now allow the minimizer z to fail to lie in [x, Y], although it can be brought arbitrarily near to that set (an example will show that this tolerance is necessary). Further, the quantityrdefined by (7) must be replaced by one that is more stable under such approximations, namely
ˆ r:= lim
δ↓0 inf
w∈Y+δB
f(w)−f(x)
. (14)
2.5. Exercise. If X is finite dimensional, show that the ˆr defined by (14) reduces to ther defined by (7). (Examples can be adduced to show that this may fail in infinite dimensions.)
2.6. Theorem (Mean Value Inequality). Let Y be a closed, convex, bounded subset of X, and let x ∈ domf, where f ∈ F. Suppose that rˆ defined above is not−∞, and let any¯r <rˆandε >0be given. Then there exist z∈[x, Y] +εB andζ∈∂Pf(z) such that
¯
r <ζ, y−x ∀y∈Y.
Further, we can choose z to satisfy f(z)< inf
[x,Y]f+|¯r|+ε.
We remark that the case ˆr=∞ is not excluded; it can only arise if Y ∩ domf =∅.
Proof of the Mean Value Inequality
Proof. Without loss of generality, we assume that x = 0 and f(x) = 0.
Suppose ¯r < ˆr and ε > 0. We will first fix some constants that will be featured in the estimates to come. The hypotheses of the theorem justify the existence of such choices. We writeYfor the quantity sup
y: y∈Y . Letr∈Rand positive constantsδ,M,k be chosen so that
¯
r < r <ˆr, |r|<|r¯|+ε, (15) δ≤ε and y∈Y +δB =⇒ f(y)≥r+δ, (16) z∈[0, Y] +δB =⇒ f(z)≥ −M, (17)
k > 1 δ2
M +|r|+ 1 + 2Y+δ
. (18)
Now we define g: [0,1]×X×X −→(−∞,∞] by g(t, y, z) :=f(z) +kty−z2−tr.
To simplify the notation somewhat, let us set S:=Y ×
[0, Y] +δB ,
and write s= (y, z) for elements inS. We can easily verify that t→ inf
s∈Sg(t, s), (19)
is continuous on [0,1], and thus attains its infimum at some point ¯t∈[0,1].
The choice of a large k value forces ¯t away from 1, which is the assertion of the first claim.
Claim 1. ¯t <1.
Proof. Lets= (y, z)∈S, and note that if y−z ≤δ, then f(z)≥r+δ by (16). A lower bound forg(1, s) can be obtained as follows:
g(1, s) =f(z) +ky−z2−r
≥
f(z) +ky−z2−r ify−z ≤δ, f(z) +kδ2−r ify−z> δ,
≥min
r+δ−r,−M +
M+|r|+ 1
−r
>0.
On the other hand, we have an upper bound of the function in (19) by settingt= 0 and noting that
s∈Sinfg(0, s)≤g(0, y,0) =f(0) +k02= 0. (20) Thus the infimum oft→infs∈Sg(t, s) taken overt∈[0,1] cannot occur at t= 1, and we have ¯t <1.
We now address the difficulty that the minimum ofs→g(¯t, s) taken over s∈S may not be attained at any point inS. We resort to a minimization principle to obtain a minimum ofg(¯t,·) perturbed by a linear function.
Note that g(¯t,·)∈ F and is bounded below on the closed bounded setS.
By Theorem 1.4.1, for each smallη >0, there existsξ:= (ξy, ξz)∈X×X and ¯s:= (¯y,z)¯ ∈S so thatξ< ηand the function
s→g(¯t, s) +ξ, s, (21)
attains its minimum over s∈ S at ¯s. Of course ξ and ¯s depend onη, as does the choice ofζappearing in the conclusions of the theorem, the latter being defined as
ζ:= 2k(¯t¯y−z)¯ −ξz.
We will see that the conclusions of the theorem hold for these choices ofζ and ¯z, provided thatη is chosen sufficiently small.
Claim 2. Ifη is small enough, we havez¯∈[0, Y] +δB.
2 The Mean Value Inequality 119 Proof. Suppose on the contrary that d[0,Y](¯z)≥δ. Observe first that
ξ, s≤η
2Y+δ)∀s∈S. (22)
Then since ¯t¯y∈[0, Y], we have by (22) and (17) that g(¯t,¯s) +ξ,s¯ =f(¯z) +k¯ty¯−z¯ 2−r¯t+ξ,s¯
≥ −M+kδ2− |r| −η
2Y+δ ,
which by the choice ofkin (18) can be made greater than 1 ifηis sufficiently small. On the other hand, recall that ¯sachieves the infimum in the function (21). Therefore if η≤
2Y+δ)−1/2, we have by (20) and (22) g(¯t,s) +¯ ξ,s ≤¯ g(0,y,¯ 0) +ξ,s¯
≤η
2y¯ +δ
≤1/2.
Thus we conclude ¯z∈[0, Y] +δBas claimed.
Claim 3. ζ∈∂Pf(¯z).
Proof. Sincez→f(z) +k¯t¯y−z2+ξz, zhas a minimum over [0, Y] +δB atz= ¯z, and ¯zis a point in the interior of this set (Claim 2), we have that
ζ=−
k¯ty¯−(·)2+
ξz,(·)
(¯z)∈∂Pf(¯z).
We next turn to proving the upper bound on f(¯z).
Claim 4. The estimatef(¯z)≤infz∈[0,Y]f(z) +|r|+εholds, providedη is small enough.
Proof. Using (22), we calculate
f(¯z)−¯tr≤f(¯z) +k¯t¯y−z¯ 2−¯tr+
ξ,(¯y,z)¯ +η
2Y+δ
= inf
s∈S
g(¯t, s) +ξ, s +η
2Y+δ
≤ inf
(y,z)∈S
f(z) +kty−z2−tr+
ξ,(y, z) +η
2Y+δ
≤ inf
z∈[0,Y]f(z) + max{0,−r}+ 2η
2Y+δ .
The final inequality is justified since the last infimum is only overz∈[0, Y], and included in the next to last infimum is the case whenty=z∈[0, Y], making it therefore an infimum over a larger set. Hence
f(¯z)≤ inf
z∈[0,Y]f(z) + max{¯t,1−¯t}|r|+ 2η
2Y+δ
≤ inf
z∈[0,Y]f(z) +|¯r|+ε,
providedη is sufficiently small, where the second relation in (15) has been used.
Claim 5. Suppose t¯ = 0, and 0 < η <
(r−¯r)/6(2Y +δ)2
. Then
¯
r <ζ, y for ally∈Y.
Proof. In the present case with ¯t = 0, the definition of ζ reduces to ζ =
−2kz¯−ξz. Let us note (using (22)) that
s∈Sinf g(0, s)≥ inf
s∈S
g(0, s) +ξ, s
−η
2Y+δ
=f(¯z) +k¯z2+
ξ,(¯y,¯z)
−η
2Y+δ
≥f(¯z) +k¯z2−2η
2Y+δ .
(23)
Now we lett:= min√
η,(r−r)/3k(Y¯ 2+δ)
>0. We have for anyy∈Y that
s∈Sinf g(t, s)≤f(¯z) +kty−¯z2−tr. (24) Since ¯t= 0 by definition provides a minimum of (19), it follows from (23) and (24) that for ally∈Y
0≤ inf
s∈Sg(t, s)−inf
s∈Sg(0, s)
≤kty−¯z2−tr−kz¯2+ 2η
2Y+δ
=t
−2k¯z, y
−r
+t2ky2+ 2η
2Y+δ
=t ζ, y
−r
+t2ky2+tξz, y+ 2η
2Y+δ
≤t ζ, y
−r
+t2kY2+tηY+ 2η
2Y+δ . Dividing through byt and using our choices oftandη, we obtain
ζ, y ≥r−tkY2−ηY − 2η t
2Y+δ
>r,¯
which concludes the proof of this claim.
To handle the situation where ¯t >0, we first prove an estimate regarding
¯ y.
Claim 6. Ifη is sufficiently small, thenζ,y¯ >(r+ ¯r)/2.
Proof. In a manner similar to the beginning of the proof of Claim 5, we have
s∈Sinf g(¯t, s)≥f(¯z) +k¯ty¯−z¯2−2η
2Y+δ
, (25)
and
s∈Sinf g(1, s)≤f(¯z) +k(1−¯t)¯y+ (¯t¯y−z)¯ 2−r. (26)
2 The Mean Value Inequality 121 Recall that ¯t provides an infimum of the function in (21). By subtracting (25) from (26), we obtain
0≤inf
s∈Sg(1, s)−inf
s∈Sg(¯t, s)
≤k(1−¯t)¯y+ (¯t¯y−z)¯ 2−r−k¯ty¯−z¯ 2+ 2η
2Y+δ
= (1−¯t)
2k(¯ty¯−z),¯ y¯
−r + 2η
2Y+δ . This in turn implies
2k(¯t¯y−z),¯ y¯
≥r− 2η 1−¯t
2Y+δ . Consequently we have
ζ,y¯=
2k(¯ty¯−z)¯ −ξz,y¯
≥
2k(¯ty¯−z),¯ y¯
−ηY
≥r−η 2
1−¯t
2Y+δ +Y
. Hence ζ,y¯ >(r+ ¯r)/2 ifη is chosen small enough.
The proof of the theorem will ensue from the next claim.
Claim 7. If in addition to the requirements onηin all of the earlier claims, we also haveη <t(r¯ −r)/¯
8Y , then
¯
r <ζ, y ∀y∈Y.
Proof. We have that the function
y→k¯ty−¯z2+ξy, y,
achieves its infimum over y ∈Y at y = ¯y. SinceY is convex, we have by Proposition 2.2 that for ally∈Y,
ζy
¯t , y−y¯
≥ −2η
¯tY. Therefore
ζ, y−y¯ =
2k(¯t¯y−z)¯ −ξz, y−y¯
≥ −2η
¯tY −2ηY
≥ −4η
¯tY ≥ r¯−r 2 .
Finally, from this and Claim 6 we conclude that for anyy∈Y that ζ, y ≥ ζ,y −¯ r−r¯
2
> r+ ¯r
2 −r−¯r 2
= ¯r, which completes the proof of the theorem.