Proof. Letx∗ ∈ C be a local minimizer of f onC. If x ∈ C, then the line segment [x∗, x] lies in C. For t ∈ (0,1), the point xt := x∗+t(x−x∗) = (1−t)x∗+txlies inC, and sincex∗ is a local minimizer,f(x∗)≤f(xt) ift is close to 0. We have
f(x∗)≤f(xt)≤(1−t)f(x∗) +tf(x),
where the last inequality follows from the convexity off. Consequently, f(x∗)≤f(x) for all x∈C,
that is,x∗ is a global minimizer off onC.
Iff is strictly convex andx∗1 andx∗2 are two global minimizers off onC, then
fx∗1+x∗2 2
<1
2f(x∗1) +1
2f(x∗2) =f(x∗1) =f(x∗2) =f∗, a contradiction. The theorem is proved.
A slightly different proof runs as follows: Ifx∈C satisfies f(x)< f(x∗), then
f(x∗+t(x−x∗)) =f((1−t)x∗+tx)≤(1−t)f(x∗) +tf(x)< f(x∗) for all t ∈(0,1], that is, f(z) < f(x∗) for all z ∈(x∗, x]. Since the segment (x∗, x] ⊆ C contains points arbitrarily near x∗, this clearly contradicts the assumption thatx∗ is a local minimizer off. ut Now we consider the minimization of a differentiable functionf on a con- vex set C. We obtain an important first-order necessary condition, called a variational inequality (the inequality (4.16) below), that a local minimizer x∗∈C off must satisfy. Iff a convex function, then the variational inequal- ity is a sufficient condition as well, so that it provides a characterization of a global minimizer of the convex functionf onC.
Theorem 4.33.Let C be a convex set inRn, and letf be a Gˆateaux differ- entiable function on an open set containingC.
(a)(First-order necessary condition for a local minimizer)Ifx∗∈C is a local minimizer of f onC, then
h∇f(x∗), x−x∗i ≥0 for all x∈C. (4.16) (b) (First-order sufficient condition for a local minimizer) If f is convex and (4.16) is satisfied atx∗∈C, thenx∗ is a global minimizer of f on C.
Proof. To prove (a), pick a pointx∈C. SinceC is convex, [x∗, x]⊆C, and sincex∗ is a local minimizer of f onC, we have f(x∗+t(x−x∗))≥f(x∗) whent >0 is close to zero. Thus,
4.5 Optimization on Convex Sets 103 h∇f(x∗), x−x∗i= lim
t&0
f(x∗+t(x−x∗))−f(x∗)
t ≥0.
To prove (b), suppose x∗ ∈ C satisfies the variational inequality (4.16).
We have
f(x)≥f(x∗) +h∇f(x∗), x−x∗i ≥f(x∗) for all x∈C
where the first inequality follows from the convexity off, and the second one
from (4.16). ut
It is clear from the proof above that Theorem 4.33 is valid in very general spaces, including normed linear spaces.
4.5.1 Examples of Variational Inequalities
Example 4.34.Letf be a Gˆateaux differentiable function in a neighborhood of a convex set C ⊆ Rn. If C has nonempty interior, and x∗ ∈ int(C) is a local minimizer of f, then ∇f(x∗) = 0, as we have seen in Chapter 2 (Theorem 2.7). This equation also follows from the variational inequality, since choosingx=x∗−∇f(x∗)∈C in (4.16) givesk∇f(x∗)k ≤0.
Example 4.35.Consider a differentiable function f : [a, b]→R. Ifx∗∈(a, b) is a local minimizer, then the preceding example above shows thatf0(x∗) = 0, the familiar condition from elementary calculus. Ifx∗=ais a local minimizer, thenx−x∗=x−a≥0 in the variational inequality, so we can deduce only that f0(a)≥0. Thus, the conditionf0(a)≥0 is the first-order necessary condition for a to be a local minimizer of f on [a, b]. A similar argument shows that ifx∗ =b is a local minimizer of f on [a, b], thenf0(b)≤0. We see that the variational inequality gives something new, even in the one-dimensional case.
Example 4.36.Consider the minimization of a differentiable function on an affine subspace,
min f(x) s.t. Ax=b,
where f : Rn → R, A is an m×n matrix, and b ∈ Rm. Define C = {x ∈ Rn :Ax=b}. Ifx∗ ∈C is a local minimizer off onC, then it satisfies the variational inequality
h∇f(x∗), x−x∗i ≥0 for all x∈C.
Since{z=x−x∗:x∈C}={z:Az= 0}=N(A), the variational inequality becomes
h∇f(x∗), zi ≥0 for all z∈N(A).
Ifz∈N(A), so is −z∈L, and the above inequality reduces to the equality
h∇f(x∗), zi= 0 for all z∈N(A).
We know from linear algebra that this is equivalent to the inclusion∇f(x∗)∈ N(A)⊥=R(AT). Consequently,
∇f(x∗)∈R(AT) (4.17)
is a necessary condition forx∗ to be a local minimizer. Iff is convex, then (4.17) is a necessary and sufficient condition for x∗ to be a global minimum off overC.
The condition (4.17) can be put in the form ΠN(A)∇f(x∗) = 0,
which states that the component of ∇f(x∗) along the feasible set C ={x: Ax=b}is zero. This resembles the first-order optimality condition in uncon- strained optimization, and should make it easier to remember (4.17).
Example 4.37.Consider thequadratic program min f(x) := 1
2hQx, xi+cTx, s.t. x≥0,
whereQis ann×nsymmetric andc∈Rn.
IfQ is positive definite, then the objective functionf(x) is coercive, and thus there exists a unique global minimizer x∗ of f over the nonnegative orthant{x∈Rn :x≥0}.
Let x∗ ≥ 0 be a local minimizer of f on the nonnegative orthant. Since
∇f(x∗) =Qx∗+c, the variational inequality becomes
hQx∗+c, x−x∗i ≥0 for all x≥0 inRn. (4.18) If we choosex= 2x∗ and then x= 0 in (4.18), we obtain hQx∗+c, x∗i= 0.
Substituting this in (4.18) implies thathQx∗+c, xi ≥0 for allx≥0, which in turn yieldsQx∗+c≥0. Therefore, (4.18) implies the conditions
Qx∗+c≥0, x∗≥0, and hQx∗+c, x∗i= 0. (4.19) Conversely, it is easy to verify that (4.19) implies (4.18).
Therefore, the two inequalities and the equation in (4.19) are the first- order necessary conditions for a local minimizer of a quadratic function f over the nonnegative orthant. If, moreover,f is a convex quadratic function, then (4.19) characterizes a global minimizer of f over the same orthant by virtue of Theorem 4.33.
Remark 4.38.The problem of finding a point x∗ satisfying (4.19), where Q is an arbitrary n×n matrix Q, is called a linear complementarity problem (LCP). Note that if Q is not symmetric, then (4.19) cannot be associated with an optimization problem, but it may come from a saddle point problem, for example.
4.5 Optimization on Convex Sets 105 Example 4.39.Consider the maximization problem
max g(x) :=xα11. . . xαnn, s.t. x1+· · ·+xn = 1
xi≥0, i= 1, . . . , n
Since eachx∗i must clearly be positive at a local maximizer, we can reformulate the problem:
min f(x) :=−α1lnx1+· · ·+ (−αn) lnxn
s.t. x1+· · ·+xn= 1.
We have ∇f(x) = (−α1/x1, . . . ,−αn/xn)T, and the constraint set has the formC={x:Ax= 1}, whereA= [1, . . . ,1]; thus it follows from (4.17) that αi/x∗i =λ(i= 1, . . . , n). Therefore,x∗i =αi/λand
1 =
n
X
i=1
x∗i =
n
X
i=1
αi λ, givingλ=Pn
i=1αi and
x∗i = αi Pn
k=1αk
, i= 1, . . . , n.
Optimization is often a useful tool for proving inequalities. For example, if αk = 1/nfor allk= 1, . . . , nin the above problem, thenx∗= (1/n, . . . ,1/n), and the optimal objective value of the maximization problem isg(x∗) = 1/n.
This proves thatg(x)≤1/nwheneverx≥0 andx1+· · ·+xn= 1. Since both the objective function and the functionh(x) :=x1+· · ·+xnare homogeneous of first degree, that is, g(tx) = tg(x) and h(tx) = th(x) for t ≥ 0, we have indeed proved the inequality
√nx1x2· · ·xn≤ x1+x2+· · ·+xn
n for all xi≥0, i= 1, . . . , n, which is the precisely the arithmetic–geometric mean inequality. Moreover, since the maximum of g over the feasible set is unique, we see that the arithmetic–geometric mean inequality becomes an equality if and only if x1=x2=· · ·=xn.
Example 4.40.Finally, we consider the minimization of a differentiable func- tion on a convex polyhedron,
min f(x) s.t. Ax≤a,
Bx=b,
where f : Rn → R, A and B are m×n and p×n matrices, respectively, a∈Rm, andb∈Rp. Define C={x∈Rn :Ax≤a, Bx=b}. Ifx∗ ∈C is a local minimizer off onC, then it satisfies the variational inequality
h∇f(x∗), x−x∗i ≥0 for all x∈C, or equivalently the implication
Ax≤a, Bx=b =⇒ h∇f(x∗), xi ≥ h∇f(x∗), x∗i.
It is not a trivial matter to rewrite this system of potentially infinitely many conditions (one condition for each x∈ P) in a compact, manageable form, but it is possible. Note that the implication above is equivalent to stating that the linear inequality system
Ax≤a, Bx=b, h∇f(x∗), xi<h∇f(x∗), x∗i
is inconsistent. It follows from Theorem 3.17 that there exist multipliersy ∈ Rm,z∈Rmsuch that
∇f(x∗) =ATy+BTz, y≥0. (4.20) These optimality conditions are referred to as the Karush–Kuhn–Tucker (KKT) conditions for the problem of minimization of f over the set C = {x: Ax≤a, Bx =b}. This topic will be discussed in great detail in Chap- ter 9.
If f is a convex function, then the KKT conditions (4.20) are, of course, necessary and sufficient conditions for a global minimizer off overC.