The Inverse Function, Implicit Function, and Lyusternik Theorems in Finite DimensionsLyusternik Theorems in Finite Dimensions

Proof. We first prove that ifAis positive definite, then all leading principal minors of A are positive. We use induction on n, the dimension of A. The proof is trivial forn= 1. Assuming that the result is true forn, we will prove it forn+ 1. LetAbe an (n+ 1)×(n+ 1) symmetric, positive definite matrix.

We write

A= B b

b^T c

whereBis a symmetricn×nmatrix,b∈Rⁿ, andc∈R. Choosing 06=d∈Rⁿ, we have

0<(d^T,0)A d

= (d^T,0) B b

b^T c d 0

=d^TBd,

that is,B is positive definite. By the induction hypothesis, we have detA_i>

0, i = 1, . . . , n. Since A is positive definite, its eigenvalues {λi}ⁿ⁺¹i=1 are all positive. Thus, we also have detAn+1 = detA=λ1· · ·λn+1>0.

Conversely, let us prove that if all detAi >0,i= 1, . . . , n+ 1, thenA is positive definite. The proof is again by induction onn. The proof is trivial for n= 1. Suppose the theorem is true forn; we will prove it forn+ 1.

Since detAi >0 fori= 1, . . . , n we see by the induction hypothesis that B is positive definite. Suppose A is not positive definite. Then λn+1 < 0, and since detA = λ1· · ·λn+1 > 0, we must also have λn < 0. Let un and un+1be the eigenvectors ofAcorresponding toλn andλn+1, respectively. We have hu_n, u_n+1i = 0, so that we can choose scalars α_n and α_n+1 such that u = α_nu_n +α_n+1u_n+1 is not zero but has the last ((n+ 1)th) component equal to zero, sayu= (v,0)^T where v6= 0. Thenu^TAu=v^TBv >0, sinceB is positive definite. However, we also have

0< u^TAu=hα_nu_n+α_n+1u_n+1, A(α_nu_n+α_n+1u_n+1)i

=hαnun+αn+1un+1, λnαnun+λn+1αn+1un+1i

=λnα²_nhun, uni+λn+1α²_n+1hun+1, un+1i<0,

where the last inequality follows from the facts λi < 0 and kuik = 1, i = n, n+ 1. This contradiction shows that all eigenvalues of A are positive. Corollary 2.20 implies thatAis positive definite. ut

This simple proof is taken from Carath´eodory [54], p. 187.

Another elegant proof of Sylvester’s theorem, more in the spirit of optimization techniques, is outlined in Exercise 12 at the end of the chapter.

2.5 The Inverse Function, Implicit Function, and

2.5 45 and is used to prove the inverse function theorem and Lyusternik’s theorem.

The implicit function theorem will also be utilized to prove Morse’s lemma in Section 2.6.

Theorem 2.26. (Implicit function theorem) Let f :U×V →R^m be a C¹mapping, whereU ⊆Rⁿ andV ⊆R^m are open sets. Let(x₀, y₀)∈U×V be a point such thatf(x₀, y₀) = 0andD_yf(x₀, y₀) :R^m→R^m, the derivative off with respect toy, is nonsingular.

Then there exist neighborhoods U1 3 x0 and V1 3 y0 and a C¹ mapping y:U1→V1 such that a point(x, y)∈U1×V1 satisfiesf(x, y) = 0if and only ify=y(x). The derivative of y atx0 is given by

Dy(x0) =−Dyf(x0, y0)⁻¹Dxf(x0, y0).

Moreover, iff isk-times continuously differentiable, that is,f ∈C^k, then y(x)∈C^k.

The linear case should help one to remember the form of the implicit function theorem: iff(x, y) =Ax+By andDyf =B is an invertible matrix, then the equationf(x, y) =αgivesAx+By =α. This may be solved fory by premultiplying it byB⁻¹, givingy(x) =B⁻¹(α−Ax).

Proof. Assume without loss of generality that x₀ = 0 and y₀ = 0, by considering the function (x, y)7→ f(x+x0, y+y0)−f(x0, y0) if necessary. Let f(x) = (f1(x, y), . . . , fm(x, y)), wherefi is theith coordinate function of f. SinceDf is continuous, there exist neighborhoods U0 andV0of the origin in Rⁿ andR^m, respectively, such that the matrix







∇^yf1(x, y1)^T

∇^yf2(x, y2)^T ...

∇^yfm(x, ym)^T







(2.2)

is invertible for all (x, yi)∈U0×V0.

We claim that for every x ∈ U0, there exists at most one y ∈ V0 such that f(x, y) = 0. Otherwise, there would exist y, z ∈ V₀, y 6= z, such that f(x, y) = f(x, z) = 0. The mean value theorem (Lemma 1.12) implies that there existsy_i∈(y, z) such that

f_i(x, z)−f_i(x, y) =h∇^yf_i(x, y_i), z−yi= 0, i= 1, . . . , m.

Since the matrix in (2.2) is nonsingular, we obtainy=z, a contradiction that proves our claim.

Let Br(0) ⊆V0. Since f(0,0) = 0, we have f(0, y)6= 0 for y ∈ Sr(0) :=

{y ∈R^l:kyk=r}, and sincef is continuous onU0×V0, there exists α >0 such thatkf(0, y)k ≥αfor ally∈Sr(0). It follows that the function

The Inverse Function, Implicit Function, and Lyusternik Theorems

F(x, y) :=kf(x, y)k²=

i=1

fi(x, y)² satisfies the properties

F(0, y)≥α >0 for y∈Sr(0) and F(0,0) = 0.

SinceF is continuous, there exists an open neighborhoodU1⊆U0 of 0∈Rⁿ such that

F(x, y)≥ α

2, F(x,0)≤ α

2 for all x∈U1, y∈Sr(0).

Thus, for a fixedx∈U₁, the function y 7→F(x, y) achieves its minimum on B_r(0) at a pointy(x) in the interior ofB_r(0), and we have

DyF(x, y(x)) = 2Dyf(x, y(x))f(x, y(x)) = 0, and since the matrixDyf(x, y(x)) is nonsingular, we conclude that

f(x, y(x)) = 0.

Writing∆y:=y(x+∆x)−y(x), we have by the mean value theorem 0 =Dxf(˜x,y)∆x˜ +Dyf(˜x,y)∆y˜

for some point (˜x,y) on the line segment between (x, y(x)) and (x+˜ ∆x, y(x+

∆x)). This implies that as k∆xk goes to zero, so does k∆yk, proving that y(x) is a continuous function.

The functiony(x) is actuallyC¹, since by Taylor’s formula 0 =f(x+∆x, y(x+∆x))−f(x, y(x))

=Dxf(x, y(x))∆x+Dyf(x, y(x))∆y+o((∆x, ∆y)), and sinceo((∆x, ∆y)) =o(∆x) by the continuity ofy(x), we have

∆y=−D_y⁻¹f(x, y(x))Dxf(x, y(x))∆x+o(∆x).

This proves thaty(x) is Fr´echet differentiable atxwith Dy(x) =−D_y⁻¹f(x, y(x))D_xf(x, y(x)).

If f ∈ C², then D⁻¹_y f(x, y(x)) = AdjDyf(x, y(x))/detDyf(x, y(x)) and D_xf(x, y(x)) areC¹, and the above formula shows that the functiony(x) is C². In general, ifC^k, we prove by induction onkthaty(x) isC^k. ut This elementary proof is taken from Carath´eodory [54], pp. 10–13. A sim- ilar kind of proof, using penalty functions, will used in Chapter 9 to obtain optimality conditions for constrained optimization problems.

2.5 47 Corollary 2.27. (Inverse function theorem) Let f be aC¹ map from a neighborhood ofx0∈Rⁿ intoRⁿ.

If Df(x0)is nonsingular, then there exist neighborhoodsU 3x0 and V 3 y0=f(x0)such thatf :U →V is aC¹ diffeomorphism, and

Df⁻¹(y) =Df(x)⁻¹ for all (x, y)∈U×V, y=f(x).

Moreover, if f isC^k, then f is aC^k diffeomorphism on U.

Proof. Define the function F(x, y) = f(x)−y, and note that DxF(x0, y) = Df(x0) is nonsingular. Apply Theorem 2.26 toF. ut The map f : R² →R² given by f(x, y) = (e^xcosy, e^xsiny) has the Ja- cobian detDf(x, y)) =e^x 6= 0, hence locally one-to-one around every point (x, y)∈R². However,f is clearly not one-to-one globally.

Definition 2.28.Let M be a nonempty subset of Rⁿ andx ∈M. A vector d∈Rⁿis called a tangent directionofM atxif there exist a sequencexn ∈M converging toxand a nonnegative sequence αn such that

n→∞lim αn(xn−x) =d.

The tangent cone of M atx, denoted byTM(x), is the set of all tangent directions ofM atx.

This definition is sufficient for our purposes. We remark that the same definition is valid in a topological vector space. A detailed study of this and several related concepts is needed in nonsmooth analysis; see [230] and [199, 200].

Theorem 2.29. (Lyusternik) Letf :U →R^mbe aC¹map, whereU ⊂Rⁿ is an open set. LetM =f⁻¹(f(x0))be the level set of a pointx0∈U.

If the derivativeDf(x₀) is a linear map onto R^m, then the tangent cone ofM atx₀ is the null space of the linear map Df(x₀), that is,

TM(x0) ={d∈Rⁿ :Df(x0)d= 0}.

Remark 2.30.Letf = (f₁, . . . , f_m), where{f_i}are the components functions off. It is easy to verify that

KerDf(x0) ={d∈Rⁿ:h∇fi(x0), di= 0, i= 1, . . . , m},

and that the surjectivity ofDf(x₀) is equivalent to the linear independence of the gradient vectors{∇f_i(x₀)}^m1 .

Proof. We may assume thatx₀= 0 andf(x₀) = 0, by considering the function x 7→ f(x+x₀)−f(x₀) if necessary. Define A := Df(0). The proof of the inclusionT_M(0)⊆KerAis easy: ifd∈T_M(0), then there exist pointsx(t) = td+o(t)∈M, and we have

The Inverse Function, Implicit Function, and Lyusternik Theorems

0 =f(0 +td+o(t)) =f(0) +tDf(0)(d) +o(t) =tDf(0)(d) +o(t).

Dividing both sides byt and lettingt→0, we obtainDf(0)(d) = 0.

The proof of the reverse inclusion KerA⊆TM(0) is based on the idea that the equationf(x) = 0 can be written asf(y, z) = 0 in a form that is suitable for applying the implicit function theorem.

Define K := KerA and L := K^⊥. Since A is onto R^m, we can identify K and L with R^n−m and R^m, respectively, by introducing a suitable basis in Rⁿ. We write a point x ∈ Rⁿ in the form x = (y, z) ∈ K×L. We have A= [D_yf(0), D_zf(0)], and

0 =A(K) ={A(d₁,0) :d₁∈R^n−m}=D_yf(0)(R^n−m),

so thatD_yf(0) = 0. SinceAhas rankm, it follows thatD_zf(0) is nonsingular.

Theorem 2.26 implies that there exist neighborhoods U₁⊆R^m andU₂⊆ R^n−m around the origin and a C¹ map α : U1 → U2, α(0) = 0, such that x= (y, z)∈U1×U2 satisfiesf(x) = 0 if and only if z=α(y). The equation f(x) = 0 can then be written asf(y, α(y)) = 0. Differentiating this equation and using the chain rule, we obtain

0 =Dyf(y, α(y)) +Dzf(y, α(y))Dα(y).

At the originx= 0,Dyf(0) = 0, andDzf(0) nonsingular, so thatDα(0) = 0.

If|y|is small, we have

α(y) =α(0) +Dα(0)y+o(y) =o(y).

Letd= (d₁,0)∈K. Ast→0, the pointx(t) := (td₁, α(td₁)) = (td₁, o(t)) lies in M, that is,f(x(t)) = 0, and satisfies (x(t)−td)/t= (0, o(t))/t→0. This implies thatK⊆T_M(0), and the theorem is proved. ut

Dalam dokumen Foundations of Optimization (Halaman 61-65)