1. Letf :I= (c, d)→Rbe ann-times differentiable function. Show that f(x) =f(a) +f0(a)(x−a) +· · ·+f(n)(a)
n! (x−a)n+o((x−a)n).
The point of the exercise is to prove the above equality without assuming that f n-times continuously differentiable, because iff(n) is continuous, then the equality follows readily from Theorem 1.1.
Hint: Prove the equality
x→alim
f(x)−f(a)−f0(a)(x−a)− · · · − f(n)n!(a)(x−a)n
(x−a)n = 0,
using induction onn, passing fromnto n+ 1 using L’Hospital’s rule.
1.8 Exercises 23 2. This exercise gives a fairly simple approach to Taylor’s formula in Cauchy’s
form (Theorem 1.5) using integration by parts. The idea is to write f(b)−f(a) =
Z b a
f0(x)dx=− Z b
a
f0(x)d(b−x), and then use integration by parts on the last integral. This gives
f(b) =f(a)−f0(x)(b−x)|ba+ Z b
a
(b−x)f00(x)dx
=f(a) +f0(a)(b−a) + Z b
a
(b−x)f00(x)dx, which is Theorem 1.5 forn= 2.
(a) Use integration by parts on the last integral above to prove the the- orem forn= 3.
(b) Use induction onnto complete the proof of Theorem 1.5.
3. This exercise outlines an interesting approach to Taylor’s formula in Cauchy’s form.
Letf :J →Rbe a function on an open intervalJ, differentiable enough times. Consider the operations
A:f(x)7→
Z x a
f(t)dt, B:f(x)7→f0(x), I:f(x)7→f(x).
Show that BA(f(x)) = f(x), but AB(f(x)) = f(x)−f(a), so that AB 6= BA, that is, A and B do not commute, when f(a) 6= 0. Obvi- ously,Bk(f(x)) =f(k)(x). The formula forAk is more complicated. Show that
A2(f(x)) = Z x
a
Z s a
f(t)dt ds= Z x
a
Z x t
f(t)ds dt= Z x
a
(x−t)f(t)dt, where the second equality follows from Fubini’s theorem for multiple in- tegrals. More generally, show that
Ak(f(x)) = Z x
a
(x−t)k−1 (k−1)! f(t)dt, a formula due to Cauchy.
Observe that
n−1
X
k=0
Ak(I−AB)Bk =
n−1
X
k=0
(AkBk−Ak+1Bk+1) =I−AnBn. Noting that (I −AB)(f(x)) = f(a), show that the above telescoping formula gives
n−1
X
k=0
(x−a)k
k! fk(a) =f(x)− Z x
a
(x−t)n−1
(n−1)! f(n)(t)dt, which is precisely Taylor’s formula in Cauchy’s form.
This problem is taken from [261], which contains simple derivations of certain other formulas in analysis.
4. Here is an interesting approach, using determinants, to Taylor’s formula in Lagrange’s form.
Letf(x),{fi(x)}n+21 (x)}be (n+1)-times continuously differentiable func- tions. Then
f(x) f1(x) · · · fn+2(x) f(0) f1(0) · · · fn+2(0) f0(0) f10(0) · · · fn+20 (0)
... ... · · · ... f(n)(0) f1(n)(0) · · · fn+2(n)(0) f(n+1)(h)f1(n+1)(h)· · · fn+2(n+1)(h)
= 0
for somehstrictly between 0 andx. To prove this, considerxas a constant and let D(i)(h) denote the function ofhby replacing the last row of the determinant with f(i)(h), f1(i)(h), . . . , fn+2(i) (h).
(a) Show that the derivative ofD(i)(h) with respect tohisD(i+1)(h) for i= 0,1, . . . , n, and the determinant above isD(n+1)(h).
(b) Show thatD(0)(0) = 0 andD(0)(x) = 0.
(c) Use Rolle’s theorem to prove the existence of h1 strictly between 0 and x such that D(1)(h1) = 0. Also, show that D(1)(0) = 0. Use Rolle’s theorem again to prove the existence ofh2 strictly between 0 andh1 such thatD(2)(h2) = 0.
(d) Continue in this fashion to show that there exists a pointhstrictly between 0 andxsuch thatD(n+1)(h) = 0.
As an application, show that there exists a pointhstrictly between 0 and xsuch that
f(x) 1 1!x x2!2 · · · xn!n xn+1 (n+1)!
f(0) 1 0 0 · · · 0 0 f0(0) 0 1 0 · · · 0 0 ... ... ... · · · ... ... f(n)(0) 0 0 0 · · · 1 0 f(n+1)(h) 0 0 0 · · · 0 1
= 0,
and that the above determinant is f(x)−f(0)−f0(0)x−f00(0)
2 x2− · · · −f(n)(0)
n! xn−f(n+1)(h) (n+ 1)! xn+1. 5. Let f : Rn → R be a function satisfying the inequality |f(x)| ≤ kxk2.
Show thatf is Fr´echet differentiable at 0.
1.8 Exercises 25 6. Define a functionf :R2→Ras follows:
f(x, y) =
x ify= 0, y ifx= 0, 0 otherwise.
Show that the partial derivatives
∂f(0,0)
∂x := lim
t→0
f(t,0)−f(0,0)
t , and ∂f(0,0)
∂y := lim
t→0
f(0, t)−f(0,0) t exist, but thatf is not Gˆateaux differentiable at (0,0).
7. (Genocchi-Peano)Define the functionf :R2→R f(x, y) =
( xy2
x2+y4 if (x, y)6= (0,0), 0 if (x, y) = (0,0).
(a) Show that f is directionally differentiable at (0,0), that is, f has directional derivatives at the origin along all directions.
(b) Show thatf is not Gˆateaux differentiable at the origin.
(c) Show that, even thoughf is continuous when restricted to lines pass- ing thorough the origin,f is not continuous at the origin.
8. Define a functionf :R2→Ras follows:
f(x, y) =
( 2yexp(−x−2)
y2+exp(−2x−2) ifx6= 0,
0 otherwise.
Show thatfis Gˆateaux differentiable at (0,0), but thatf is not continuous there.
9. Define a functionf :R2→Ras follows:
f(x, y) = ( x3y
x4+y2 if (x, y)6= (0,0), 0 if (x, y) = (0,0).
Show that f is Gˆateaux differentiable but not Fr´echet differentiable at (0,0).
10. Define a functionf :R2→Ras follows:
f(x, y) =
(y(x2+y2)3/2
(x2+y2)2+y2 if (x, y)6= (0,0), 0 if (x, y) = (0,0).
Show that f is Gˆateaux differentiable but not Fr´echet differentiable at (0,0).
11. Define a functionf :R2→Ras follows:
f(x, y) = (xy
r sin 1r
if (x, y)6= (0,0), 0 if (x, y) = (0,0),
where r=k(x, y)k= (x2+y2)1/2. Show that∂f /∂xand ∂f /∂y exist at every point (x, y) ∈ R2, and the four functions x 7→ ∂f(x, b)/∂x, y 7→
∂f(a, y)/∂x, x 7→ ∂f(x, b)/∂y, y 7→ ∂f(a, y)/∂y are continuous for any (a, b)∈R2, butf is not Fr´echet differentiable at (0,0).
12. Letf :R2→Rbe a function defined by the formula f(x, y) =
(xy(x2−y2)
x2+y2 if (x, y)6= (0,0), 0 if (x, y) = (0,0).
Show that all four second-order partial derivatives ∂2f /∂x2, ∂2f /∂x∂y,
∂2f/∂y∂x, and∂2f/∂y2exist everywhere onR2, but∂2f/∂x∂y6=∂2f/∂y∂x at the point (0,0).
13. Define a functionF :R2→R2 as follows:
F(x, y) = (x3, y2).
Let x= (0,0) andy = (1,1). Show that there is no vectorz on the line segment betweenxandy such that
F(y)−F(x) =DF(z)(y−x).
This shows that the mean value theorem (Lemma 1.12) does not general- ize, at least in the same form.
14. Letf :Rn→Rmbe a Gˆateaux differentiable map such that the Jacobian Df vanishes identically, that is, Df(x) = 0 for all x ∈ Rn. Use Theo- rem 1.18 to give a short proof thatf must be a constant function. More generally, use the same theorem to prove that ifDf is a constant matrix, thenf must be an affine transformation.
15. For a given scalarp∈[1,∞), let
f(x)≡ kxkp≡
n
X
i=1
|xi|p
!1/p
, x∈Rn,
denote the lp-norm for vectors in Rn. Compute the partial derivatives
∂f /∂xi,i= 1,2, . . . , n, for any vectorxwith no zero component. Doesf have a Fr´echet or Gˆateaux derivative at such a point? At the pointx= 0?
What more can be said for the casep= 2?
16. This exercise shows that Gˆateaux differentiability may not be enough for the chain rule to hold.
1.8 Exercises 27 (a) (Fr´echet) Define the functions f : R → R2, f(t) = (t, t2), and
g:R2→R,
g(x, y) =
(x ify=x2, 0 otherwise.
Show thatgis Gˆateaux differentiable at (0,0) with gradient∇g(0,0) = (0,0), butg◦f is the identity function onR, to conclude that the chain rule forg◦f fails at t= 0.
(b) Define the functionsf :R→R2,f(t) = (tcost, tsint), andg:R2→ Rgiven (in polar coordinates) by
g(r, θ) = (r2
θ3 if 0< θ <2π, 0 ifθ= 0.
Show thatgis Gˆateaux differentiable at (0,0) with gradient∇g(0,0) = (0,0), but (g◦f)(t) = 1/t, so that the chain rule forg◦f again fails att= 0.
17. Let f : V →R be an infinitely differentiable function on a vector space V. Letf ben-homogeneous, that is,
f(tx) =tnf(x).
Show that
Dkf(x)
x, . . . , x
| {z }
k times
= ( n!
(n−k)!f(x), k= 0, . . . , n,
0, k > n.
The formula for the case k = 1,Df(x)[x] = nf(x), is known as Euler’s formula.
Hint: Write the Taylor series for f(x+tx), and note that f(x+tx) = f((1 +t)x) = (1 +t)nf(x).
18. Let M : Rn1 ×Rn2 × · · · ×Rnk → Rm be a multilinear map, that is, xi7→M(x1, . . . , xi−1, xi, xi+1, . . . , xk) is linear when all variablesxjother thanxi are fixed. Show that
M0(x;h) =M(h1, x1, . . . , xk) +M(x1, h2, x3, . . . , xk)+
· · ·+M(x1, x2, . . . , xk−1, hk),
where we have used the notation x= (x1, . . . , xk),h= (h1, . . . , hk), and M0(x;h) = M0(x1, . . . , xk;h1, . . . , hk). Then, computeD2M(x)[h, h] and D3M(x)[h, h, h]. How do the formulas simplify when M is a symmetric multilinear mapping?
Hint:ComputeM(x1+th1, x2+th2, . . . , xk+thk) using multilinearity of M.
19. LetF :Rn →Rmbe a map with Lipschitz derivative, that is, there exists L≥0 such that
kDF(y)−DF(x)k ≤Lky−xk for all x, y∈Rn. Show that
kF(y)−F(x)−DF(x)(y−x)k ≤ L
2ky−xk2 for all x, y∈Rn. Notice that the slightly weaker inequality, with the constantL/2 replaced byL, follows immediately from Theorem 1.18.
Hint: Define the functionϕ(t) =F(x+t(y−x))−tDF(x)(y−x). Show that ϕ0(t) = (DF(x+t(y−x))−DF(x))(y−x), and use the inequality
R1
0 ϕ0(t)dt ≤R1
0 kϕ0(t)kdt.
20. Letf :I= (c, d)→Rbe such that 0∈I.
(a) If f ∈ C1 (continuously differentiable) on I, then show that there exists a continuous functionaonI such that
f(x) =f(0) +a(x)x.
Moreover, show that iff ∈C2, thena∈C1.
(b) If f ∈ C2 (twice continuously differentiable) on I, then show that there exists a continuous functionbonI such that
f(x) =f(0) +f0(0)x+b(x)x2. Hint:Ifx6= 0, the above equations definea(x) andb(x),
a(x) =f(x)−f(0)
x , b(x) = f(x)−f(0)−f0(0)x
x2 .
Use L’Hospital’s rule to show that a(0) and b(0) can be defined in such a way that the functionsa, bare continuous atx= 0.
(c) Letf :U →Rbe aC1(continuous partial derivatives) function in a neighborhoodU of the origin inRn. Prove that there exist continuous functions{ai(x)}n1 onU such that
f(x1, . . . , xn) =f(0) +
n
X
i=1
xiai(x1, . . . , xn).
Moreover, show that iff ∈C2, thenai∈C1.
Hint:Show that (a) guarantees the existence of a continuous function a(x) such that f(x1, . . . , xn) = f(0, x2, . . . , xn) +x1a1(x1, . . . , xn).
Then, use induction onn.
(d) Let f : U → R be a C2 (second partial derivatives continuous) function onU. Using (b), show that there exists a continuous function b(x) onU such that
f(x1, . . . , xn) =f(0, x2, . . . , xn)+x1
∂f(0, x2, . . . , xn)
∂x1
+x21b(x1, . . . , xn).
1.8 Exercises 29 (e) Letf be a function as in (d) and assume that∇f(0) = 0. Prove that
there exist continuous functions{bij(x)}ni,j=1 onU such that f(x1, . . . , xn) =f(0) +
n
X
i,j=1
xixibij(x1, . . . , xn).
Hint:Use (c) on theC1function∂f(0, x2, . . . , xn)/∂x1in (d) to obtain a representation
f(x1, . . . , xn) =f(0, x2, . . . , xn) +
n
X
j=1
x1xjb1j(x1, . . . , xn).
Complete the proof by induction onn.
21. Compute the first two derivatives of the determinant function f(x) = detX onSn, the space ofn×nsymmetric real matrices
(a) From scratch, mimicking the derivation in Example 1.27.
(b) Using the chain rule and the results of Example 1.27.
22. LetRn×nbe the space ofn×nreal matrices. Show that ifA(t)∈Rn×n is a differentiable function oftthend(detA(t))/dtis the sum of the determi- nants ofnmatrices, in which theith matrix isA(t) except that theith row is differentiated. Use this result to prove that the directional derivative of the determinant function at the matrixA∈Rn×n along the direction B∈Rn×n is given by
(det)0(A;B) = tr(Adj(A)B) =hAdj(A)T, Bi,
where Adj(A) is the adjoint ofA, and where the inner product onRn×n is the trace inner product given byhX, Yi= tr(XTY). Conclude that
D(det)(A) = Adj(A)T. Hint:Use the determinant formula detX=P
σsgn(σ)x1σ(1)· · ·xnσ(n) to computed(detA(t))/dt, and Laplace’s expansion formula for determinants to compute (det)0(A;B).
23. LetA∈Rn×n. Show that
(a) (det)0(I;A) = tr(A), whereI is the identity matrix.
Hint:Show that det(I+tA) = 1 +t(a11+a22+· · ·+ann) +· · ·, using the formula detA=P
σsgn(σ)a1σ(1)· · ·anσ(n). (b) Show that ifAis a nonsingular matrix, then
(det)0(A;B) = det(A) tr(A−1B) =hdet(A)A−T, Bi. Consequently, show that
A−1=Adj(A) detA .
Hint: Use det(A+tB) = det(A) det(I +A−1B) and the previous problem.
24. Prove Corollary 1.30.
2
Unconstrained Optimization
In optimization theory, the optimality conditions for interior points are usu- ally much simpler than the optimality conditions for boundary points. In this chapter, we deal with the former, easier case. Boundary points appear more prominently in constrained optimization, when one tries to optimize a func- tion, subject to several functional constraints. For this reason, the optimality conditions for boundary points are generally discussed in constrained opti- mization, whereas the optimality conditions for interior points are discussed in unconstrained optimization, regardless of whether the optimization prob- lem at hand has constraints.
In this chapter, we first establish some basic results on the existence of global minimizer or maximizers of continuous functions on a metric space.
These are the famous Weierstrass theorem and its variants, which are es- sentially the only general tools available for establishing the existence of optimizers.
The rest of the chapter is devoted to obtaining the fundamental first-order and second-order necessary and sufficient optimality conditions for minimiz- ing or maximizing differentiable functions. Since the tools here are based on differentiation, and differentiation is a local theory, the optimality conditions generally apply to local optimizers. The necessary and sufficient conditions play different, usually complementary, roles. A typical necessary condition for ties or inequalities, must be satisfied at a local minimizer. A typical sufficient condition for a local minimizer, however, states that if certain conditions are satisfied at a given point, then that point must be a local minimizer.
The nature (local minimum, local maximum, or saddle point) of a critical point xof a twice differentiable function f is deduced from the definiteness properties of the quadratic form q(d) = hD2f(x)d, di involving the Hessian matrixD2f(x). Thus, there is a need for an efficient recognition of a symmetric matrix. Several tools are developed in Section 2.4 for this purpose. A novel feature of this section is that we give an exposition of a simple tool,Descartes’s
31 DOI 10.1007/978-0-387-68407-9_2, © Springer Science +Business Media, LLC 2010
a local minimizer, say, states that certain conditions, usually given as equali-
O. Gü ler, Foundations of Optimization, Graduate Texts in Mathematics 258,
rule of sign, that can be used to count exactly the number of positive and negative eigenvalues of a symmetric matrix, includingD2f(x).
The inverse function theorem and the closely related implicit function theorem are important tools in many branches of analysis. Another closely related result, Lyusternik’s theorem [191], is an important tool in optimiza- tion, where it is used in the derivation of optimality conditions in constrained optimization. We give an elementary proof of the implicit function theorem in finite-dimensional vector spaces in Section 2.5, following Carath´eodory [54], and use it to prove the inverse function theorem and Lyusternik’s theorem in finite dimensions. The proof of the same theorems in Banach spaces is given in Chapter 3 using Ekeland’s-variational principle. If one is interested only in finite-dimensional versions of these results, it suffices to read only Section 2.5.
The local behavior of a C2 function f around a nondegenerate critical pointx(D2f(x) is nonsingular) is determined by the Hessian matrixD2f(x).
This is the content of Morse’s lemma, which is treated in Section 2.6. Morse’s lemma is a basic result in Morse theory, which investigates the relationships between various types of critical points of a function f; see, for example, Milnor [197] for an introduction to Morse theory.