Proof. We first prove that ifAis positive definite, then all leading principal minors of A are positive. We use induction on n, the dimension of A. The proof is trivial forn= 1. Assuming that the result is true forn, we will prove it forn+ 1. LetAbe an (n+ 1)×(n+ 1) symmetric, positive definite matrix.
We write
A= B b
bT c
,
whereBis a symmetricn×nmatrix,b∈Rn, andc∈R. Choosing 06=d∈Rn, we have
0<(dT,0)A d
0
= (dT,0) B b
bT c d 0
=dTBd,
that is,B is positive definite. By the induction hypothesis, we have detAi>
0, i = 1, . . . , n. Since A is positive definite, its eigenvalues {λi}n+1i=1 are all positive. Thus, we also have detAn+1 = detA=λ1· · ·λn+1>0.
Conversely, let us prove that if all detAi >0,i= 1, . . . , n+ 1, thenA is positive definite. The proof is again by induction onn. The proof is trivial for n= 1. Suppose the theorem is true forn; we will prove it forn+ 1.
Since detAi >0 fori= 1, . . . , n we see by the induction hypothesis that B is positive definite. Suppose A is not positive definite. Then λn+1 < 0, and since detA = λ1· · ·λn+1 > 0, we must also have λn < 0. Let un and un+1be the eigenvectors ofAcorresponding toλn andλn+1, respectively. We have hun, un+1i = 0, so that we can choose scalars αn and αn+1 such that u = αnun +αn+1un+1 is not zero but has the last ((n+ 1)th) component equal to zero, sayu= (v,0)T where v6= 0. ThenuTAu=vTBv >0, sinceB is positive definite. However, we also have
0< uTAu=hαnun+αn+1un+1, A(αnun+αn+1un+1)i
=hαnun+αn+1un+1, λnαnun+λn+1αn+1un+1i
=λnα2nhun, uni+λn+1α2n+1hun+1, un+1i<0,
where the last inequality follows from the facts λi < 0 and kuik = 1, i = n, n+ 1. This contradiction shows that all eigenvalues of A are posi- tive. Corollary 2.20 implies thatAis positive definite. ut
This simple proof is taken from Carath´eodory [54], p. 187.
Another elegant proof of Sylvester’s theorem, more in the spirit of opti- mization techniques, is outlined in Exercise 12 at the end of the chapter.
2.5 The Inverse Function, Implicit Function, and
2.5 45 and is used to prove the inverse function theorem and Lyusternik’s theorem.
The implicit function theorem will also be utilized to prove Morse’s lemma in Section 2.6.
Theorem 2.26. (Implicit function theorem) Let f :U×V →Rm be a C1mapping, whereU ⊆Rn andV ⊆Rm are open sets. Let(x0, y0)∈U×V be a point such thatf(x0, y0) = 0andDyf(x0, y0) :Rm→Rm, the derivative off with respect toy, is nonsingular.
Then there exist neighborhoods U1 3 x0 and V1 3 y0 and a C1 mapping y:U1→V1 such that a point(x, y)∈U1×V1 satisfiesf(x, y) = 0if and only ify=y(x). The derivative of y atx0 is given by
Dy(x0) =−Dyf(x0, y0)−1Dxf(x0, y0).
Moreover, iff isk-times continuously differentiable, that is,f ∈Ck, then y(x)∈Ck.
The linear case should help one to remember the form of the implicit function theorem: iff(x, y) =Ax+By andDyf =B is an invertible matrix, then the equationf(x, y) =αgivesAx+By =α. This may be solved fory by premultiplying it byB−1, givingy(x) =B−1(α−Ax).
Proof. Assume without loss of generality that x0 = 0 and y0 = 0, by con- sidering the function (x, y)7→ f(x+x0, y+y0)−f(x0, y0) if necessary. Let f(x) = (f1(x, y), . . . , fm(x, y)), wherefi is theith coordinate function of f. SinceDf is continuous, there exist neighborhoods U0 andV0of the origin in Rn andRm, respectively, such that the matrix
∇yf1(x, y1)T
∇yf2(x, y2)T ...
∇yfm(x, ym)T
(2.2)
is invertible for all (x, yi)∈U0×V0.
We claim that for every x ∈ U0, there exists at most one y ∈ V0 such that f(x, y) = 0. Otherwise, there would exist y, z ∈ V0, y 6= z, such that f(x, y) = f(x, z) = 0. The mean value theorem (Lemma 1.12) implies that there existsyi∈(y, z) such that
fi(x, z)−fi(x, y) =h∇yfi(x, yi), z−yi= 0, i= 1, . . . , m.
Since the matrix in (2.2) is nonsingular, we obtainy=z, a contradiction that proves our claim.
Let Br(0) ⊆V0. Since f(0,0) = 0, we have f(0, y)6= 0 for y ∈ Sr(0) :=
{y ∈Rl:kyk=r}, and sincef is continuous onU0×V0, there exists α >0 such thatkf(0, y)k ≥αfor ally∈Sr(0). It follows that the function
The Inverse Function, Implicit Function, and Lyusternik Theorems
F(x, y) :=kf(x, y)k2=
m
X
i=1
fi(x, y)2 satisfies the properties
F(0, y)≥α >0 for y∈Sr(0) and F(0,0) = 0.
SinceF is continuous, there exists an open neighborhoodU1⊆U0 of 0∈Rn such that
F(x, y)≥ α
2, F(x,0)≤ α
2 for all x∈U1, y∈Sr(0).
Thus, for a fixedx∈U1, the function y 7→F(x, y) achieves its minimum on Br(0) at a pointy(x) in the interior ofBr(0), and we have
DyF(x, y(x)) = 2Dyf(x, y(x))f(x, y(x)) = 0, and since the matrixDyf(x, y(x)) is nonsingular, we conclude that
f(x, y(x)) = 0.
Writing∆y:=y(x+∆x)−y(x), we have by the mean value theorem 0 =Dxf(˜x,y)∆x˜ +Dyf(˜x,y)∆y˜
for some point (˜x,y) on the line segment between (x, y(x)) and (x+˜ ∆x, y(x+
∆x)). This implies that as k∆xk goes to zero, so does k∆yk, proving that y(x) is a continuous function.
The functiony(x) is actuallyC1, since by Taylor’s formula 0 =f(x+∆x, y(x+∆x))−f(x, y(x))
=Dxf(x, y(x))∆x+Dyf(x, y(x))∆y+o((∆x, ∆y)), and sinceo((∆x, ∆y)) =o(∆x) by the continuity ofy(x), we have
∆y=−Dy−1f(x, y(x))Dxf(x, y(x))∆x+o(∆x).
This proves thaty(x) is Fr´echet differentiable atxwith Dy(x) =−Dy−1f(x, y(x))Dxf(x, y(x)).
If f ∈ C2, then D−1y f(x, y(x)) = AdjDyf(x, y(x))/detDyf(x, y(x)) and Dxf(x, y(x)) areC1, and the above formula shows that the functiony(x) is C2. In general, ifCk, we prove by induction onkthaty(x) isCk. ut This elementary proof is taken from Carath´eodory [54], pp. 10–13. A sim- ilar kind of proof, using penalty functions, will used in Chapter 9 to obtain optimality conditions for constrained optimization problems.
2.5 47 Corollary 2.27. (Inverse function theorem) Let f be aC1 map from a neighborhood ofx0∈Rn intoRn.
If Df(x0)is nonsingular, then there exist neighborhoodsU 3x0 and V 3 y0=f(x0)such thatf :U →V is aC1 diffeomorphism, and
Df−1(y) =Df(x)−1 for all (x, y)∈U×V, y=f(x).
Moreover, if f isCk, then f is aCk diffeomorphism on U.
Proof. Define the function F(x, y) = f(x)−y, and note that DxF(x0, y) = Df(x0) is nonsingular. Apply Theorem 2.26 toF. ut The map f : R2 →R2 given by f(x, y) = (excosy, exsiny) has the Ja- cobian detDf(x, y)) =ex 6= 0, hence locally one-to-one around every point (x, y)∈R2. However,f is clearly not one-to-one globally.
Definition 2.28.Let M be a nonempty subset of Rn andx ∈M. A vector d∈Rnis called a tangent directionofM atxif there exist a sequencexn ∈M converging toxand a nonnegative sequence αn such that
n→∞lim αn(xn−x) =d.
The tangent cone of M atx, denoted byTM(x), is the set of all tangent directions ofM atx.
This definition is sufficient for our purposes. We remark that the same defi- nition is valid in a topological vector space. A detailed study of this and several related concepts is needed in nonsmooth analysis; see [230] and [199, 200].
Theorem 2.29. (Lyusternik) Letf :U →Rmbe aC1map, whereU ⊂Rn is an open set. LetM =f−1(f(x0))be the level set of a pointx0∈U.
If the derivativeDf(x0) is a linear map onto Rm, then the tangent cone ofM atx0 is the null space of the linear map Df(x0), that is,
TM(x0) ={d∈Rn :Df(x0)d= 0}.
Remark 2.30.Letf = (f1, . . . , fm), where{fi}are the components functions off. It is easy to verify that
KerDf(x0) ={d∈Rn:h∇fi(x0), di= 0, i= 1, . . . , m},
and that the surjectivity ofDf(x0) is equivalent to the linear independence of the gradient vectors{∇fi(x0)}m1 .
Proof. We may assume thatx0= 0 andf(x0) = 0, by considering the function x 7→ f(x+x0)−f(x0) if necessary. Define A := Df(0). The proof of the inclusionTM(0)⊆KerAis easy: ifd∈TM(0), then there exist pointsx(t) = td+o(t)∈M, and we have
The Inverse Function, Implicit Function, and Lyusternik Theorems
0 =f(0 +td+o(t)) =f(0) +tDf(0)(d) +o(t) =tDf(0)(d) +o(t).
Dividing both sides byt and lettingt→0, we obtainDf(0)(d) = 0.
The proof of the reverse inclusion KerA⊆TM(0) is based on the idea that the equationf(x) = 0 can be written asf(y, z) = 0 in a form that is suitable for applying the implicit function theorem.
Define K := KerA and L := K⊥. Since A is onto Rm, we can identify K and L with Rn−m and Rm, respectively, by introducing a suitable basis in Rn. We write a point x ∈ Rn in the form x = (y, z) ∈ K×L. We have A= [Dyf(0), Dzf(0)], and
0 =A(K) ={A(d1,0) :d1∈Rn−m}=Dyf(0)(Rn−m),
so thatDyf(0) = 0. SinceAhas rankm, it follows thatDzf(0) is nonsingular.
Theorem 2.26 implies that there exist neighborhoods U1⊆Rm andU2⊆ Rn−m around the origin and a C1 map α : U1 → U2, α(0) = 0, such that x= (y, z)∈U1×U2 satisfiesf(x) = 0 if and only if z=α(y). The equation f(x) = 0 can then be written asf(y, α(y)) = 0. Differentiating this equation and using the chain rule, we obtain
0 =Dyf(y, α(y)) +Dzf(y, α(y))Dα(y).
At the originx= 0,Dyf(0) = 0, andDzf(0) nonsingular, so thatDα(0) = 0.
If|y|is small, we have
α(y) =α(0) +Dα(0)y+o(y) =o(y).
Letd= (d1,0)∈K. Ast→0, the pointx(t) := (td1, α(td1)) = (td1, o(t)) lies in M, that is,f(x(t)) = 0, and satisfies (x(t)−td)/t= (0, o(t))/t→0. This implies thatK⊆TM(0), and the theorem is proved. ut