Extended convergence of Gauss-Newton’s method and uniqueness of the solution

(1)

CARPATHIAN J. MATH.

34(2018), No. 2, 135 - 142

Online version athttp://carpathian.ubm.ro

Print Edition: ISSN 1584 - 2851 Online Edition: ISSN 1843 - 4401

Extended convergence of Gauss-Newton’s method and uniqueness of the solution

IOANNISK. ARGYROS, YEOLJECHOand SANTHOSHGEORGE

ABSTRACT. The aim of this paper is to extend the applicability of the Gauss-Newton’s method for solving nonlinear least squares problems using our new idea of restricted convergence domains. The new technique uses tighter Lipschitz functions than in earlier papers leading to a tighter ball convergence analysis.

1. INTRODUCTION

Let us consider the nonlinear least squares problem:

(1.1) minf(x) :=1

2F(x)^TF(x),

whereF :R^j −→Rⁱis Fr´echet-differentiable,j ≥iandk.kstands for the2−norm unless otherwise stated. Gauss-Newton’s method (GN) defined for eachn= 0,1,2. . .by (1.2) xn+1=xn−[F⁰(xn)^TF⁰(xn)]⁻¹F⁰(xn)^TF(xn)

where x0 is an initial point is undoubtedly the most popular method for generating a sequence{xn} approximating a solutionpof problem (1.1). There is a plethora of ball convergence results for GN based on Lipschitz-type conditions [1–15]. In the present study, we use our new idea of restricted convergence domains leading to tighter Lipschitz functions and to the advantages (A) over earlier work such as [4–8, 13, 15]:

(a1) A larger radius of convergence resulting to a wider choice of initial points.

(a2) Tighter error bounds on the distanceskxn−pkleading to the computation of fewer iterates to obtain a desired error tolerance.

(a3) An at least as precise information on the location of the solution.

The advantages (A) are obtained under the same computational cost as in earlier studies, since in practice the computation of the old Lipschitz functions requires the computation of the new Lipschitz functions as special cases. The study of the ball convergence of iterative methods is also important because it shows the degree of difficulty in obtaining good initial pointsx₀.Our technique can be used in an analogous way to improve results for other iterative methods [1, 4–15].

The rest of the paper is structured as follows. Section 2 contains auxiliary results, whe- reas Section 3 includes the ball convergence of GN.

2. AUXILIARY RESULTS

In order to make the paper as self contained as possible we restate some standard con- cepts and results.

Received: 08.08.2017. In revised form: 13.03.2018. Accepted: 20.03.2018 2010Mathematics Subject Classification. 65G99, 65H10, 47H17, 49M15.

Key words and phrases. Gauss-Newton’s method, least squares problems, Ball convergence, Lipschitz condition.

Corresponding author: Ioannis K. Argyros;[email protected] 135

(2)

LetR^i×j denote the set of alli×jmatrixM₁, M₁^† denote the Moore-penrose inverse of matrixM1,and ifM1has full rank (namely: rank(M1) = min{m, n} =n) thenM₁^† = (M₁^TM1)⁻¹M₁^T.We state the auxilary results [4–8, 12, 15].

Lemma 2.1. Suppose thatM1, M3 ∈R^i×j, M2 =M1+M3,kM₁^†kkM3k <1, rank(M1) = rank(M₂),then

(2.1) kM₂^†k ≤ kM₁^†k

1− kM₁^†kkM3k. Moreover, ifrank(M₂) =rank(M₁) = min{m, n},then, we have

(2.2) kM₂^†−M₁^†k ≤

√2kM₁^†k²kM3k 1− kM₁^†kkM3k.

Lemma 2.2. Suppose thatM₁, M₃ ∈ R^i×j, M₂ = M₁+M₃, kM₃M₁^†k < 1, rank(M₁) = rank(M2) =n,thenrank(M2) =n.

Lemma 2.3. Let

(2.3) ϕ(t) = 1

t^α Z t

0

L(u)u^α−1du, α≥1,0≤t≤ρ,

whereL(u)is a positive integrable function and nondecreasing monotonically in[0, ρ].Thenϕ(t) is nondecreasing with respect tot.

Lemma 2.4. Suppose that

(2.4) ψ(t) = 1

t^α Z t

0

L(u)(t−u)du,

whereL(u)is a positive integrable function in[0, ρ].Thenψ(t)is nondecreasing monotonically increasing with respect tot.

3. LOCAL CONVERGENCE

LetU(w, ξ),U¯(w, ξ)stand, respectively for the open and closed balls inRⁱwith center w∈Rⁱand of radiusξ >0.

We base the local convergence analysis of GN on some scalar Lipschitz functions.

Definition 3.1. LetL₀be a nondecreasing, positive integrable function defined on the interval[0, ρ].We say thatF⁰satisfies the center-radius Lipschitz condition withL₀average, if

kF⁰(x)−F⁰(p)k ≤ Z d(x)

0

L₀(u)dufor eachx∈U(p, ρ).

Define parametersρ0for someβ >0by

(3.5) ρ₀= sup{u∈[0, ρ] :βL₀(u)<1}.

SetU₀=U(p, ρ₀).Notice thatU₀⊆U(p, ρ).

Definition 3.2. LetLbe a nondecreasing, positive integrable function defined on the interval[0, ρ0].We say thatF⁰satisfies the restricted-radius Lipschitz condition withLaverage, if

kF⁰(x)−F⁰(x^θ)k ≤ Z d(x)

θd(x)

L(u)dufor eachx∈U(p, ρ0),0≤θ≤1, wherex^θ=p+θ(x−p)andd(x) =kx−pk.

(3)

Extended convergence of Gauss-Newton’s method 137

Notice: L = L(L₀). That is the construction of functionL depends onL₀ andρ₀.In earlier studies [4–8, 14, 15] the following definition was used:

Definition 3.3. Let L₁ be a nondecreasing positive integrable function defined on the interval[0, ρ].We say thatF⁰satisfies the radius Lipschitz condition withL1average, if

kF⁰(x)−F⁰(x^θ)k ≤ Z d(x)

θd(x)

L1(u)dufor eachx∈U(p, ρ),0≤θ≤1.

Notice that

(3.6) L₀(u)≤L₁(u),

(3.7) L(u)≤L1(u)

hold and ^L_L¹

0 can be arbitrarily large [4–6]. In the earlier studies (with the exception of our works where bothL1 andL0are used) only the functionL1 is used in the convergence analysis of the Gauss-Newton method. However, in view of (3.6) and (3.7) the earlier results can be improved, if the more precise functionL0andLare used instead ofL1(or L1andL0). If one uses the Banach lemma on invertible operators [11] andL1, then (3.8) k[F⁰(x)^TF⁰(x)]⁻¹F⁰(x)^Tk ≤ β

1−βRd(x)

0 L1(u)du

is obtained (forβ >0to be defined later) instead of the more precise estimate usingL0: (3.9) k[F⁰(x)^TF⁰(x)]⁻¹F⁰(x)^Tk ≤ β

1−βRd(x)

0 L0(u)du .

Similarly, at the numerators of the estimates involvedL, L0 can be used instead of the less presiseL0, L1leading to the advantages (A). We can state the main local convergence result for the Gauss-Newton method.

Theorem 3.1. Suppose: vectorpsatisfies problem (1.1);Fhas a continuous derivative inU(p, ρ);

F⁰(p)has full rank:F⁰satisfies the center-radius -Lipschitz condition withL₀average;F⁰satisfies the restricted-radius-Lipschitz condition withLaverage andρsatisfies

(3.10) λ(L0, L, ρ) =λ(ρ) = βRρ

0 L(u)udu ρ(1−βRρ

0 L0(u)du)+

√2cβ²Rρ

0 L0(u)du ρ(1−βRρ

0 L0(u)du)≤1.

Then GN is convergenct for allx₀∈U(p, ρ)and kxn+1−pk ≤ βR

d(x0)L(u)udu d(x0)²(1−βRd(x₀)

0 L0(u)du)

kxn−pk²

+

√2cβ²Rd(x0)

0 L0(u)du d(x0)(1−βRd(x0)

0 L0(u)du)

kx_n−pk, (3.11)

where

(3.12) c=kF⁰(x)k, β=k[F⁰(p)^TF⁰(p)]⁻¹F⁰(p)^Tk and

q0(L0, L) =q0 = βRd(x₀)

0 L(u)udu d(x0)(1−βRd(x0)

0 L0(u)du) +

√

2cβ²Rd(x₀)

0 L0(u)du d(x₀)(1−βRd(x₀)

0 L₀(u)du)

<1., (3.13)

(4)

Moreover, ifc= 0,then

(3.14) kxn−pk ≤q²₀ⁿ⁻¹kx0−pkfor eachn= 1,2, . . . .

Proof. We first show thatq₀∈[0,1).Using monotonicity ofLand Lemma 2.3, we have

q0 = βRd(x₀)

0 L(u)udu d(x₀)²(1−βRd(x₀)

0 L₀(u)du) d(x0)

+

√

2cβ²Rd(x₀)

0 L0(u)du d(x₀)²(1−βRd(x₀)

0 L₀(u)du) d(x0)

< βRρ

0 L(u)udu ρ²(1−βRd(x₀)

0 L₀(u)du) d(x0)

+

√

2cβ²Rd(x₀)

0 L0(u)du ρ²(1−βRd(x₀)

0 L₀(u)du) d(x0)

≤ kx0−pk ρ <1.

and

k[F⁰(p)^TF⁰(p)]⁻¹kkF⁰(x)−F⁰(p)k ≤ β Z d(x)

0

L₀(u)du

≤ β Z ρ

0

L₀(u)du <1, for eachx∈U(p, ρ).

By Lemma 2.1 and 2.2, we know that for eachx∈U(p, ρ), F⁰(x)has full rank and

k[F⁰(x)^TF⁰(x)]⁻¹F⁰(x)^Tk ≤ β 1−βRd(x)

0 L0(u)du , for eachx∈U(p, ρ),

k[F⁰(x)^TF⁰(x)]⁻¹F⁰(x)^T −[F⁰(p)^TF⁰(p)]⁻¹F⁰(p)^Tk ≤

√

2β²Rd(x)

0 L0(u)du d(x₀)(1−βRd(x)

0 L₀(u)du) , for eachx∈U(p, ρ).

Ifxn ∈U(p, ρ),we have by (1.2)

xn+1−p = xn−p−[F⁰(xn)^TF⁰(xn)]⁻¹F⁰(xn)^TF(xn)

= [F⁰(xn)^TF⁰(xn)]⁻¹F⁰(p)^T[F⁰(xn)(xn−p)−F(xn) +F(p)]

+[F⁰(xn)^TF⁰(xn)]⁻¹F⁰(p)^TF⁰(p)−[F⁰(xn)^TF⁰(xn)]⁻¹F⁰(xn)^TF(p).

(5)

That is

kx_n+1−pk ≤ k[F⁰(x_n)^TF⁰(x_n)]⁻¹F⁰(p)^Tkk Z 1

0

[F⁰(x_n)−F⁰(p+t(x_n−p))](p−x_n)dtk +k[F⁰(x_n)^TF⁰(x_n)]⁻¹F⁰(p)^TF⁰(p)−[F⁰(x_n)^TF⁰(x_n)]⁻¹F⁰(x_n)^TkkF(p)k

≤ β

1−βRd(xn)

0 L0(u)du Z 1

0

Z d(xn)

td(x_n)

L(u)dud(x_n)dt

+

√2cβ²Rd(x_n)

0 L₀(u)du 1−βRd(xn)

0 L₀(u)du

≤ βRd(x_n) 0 L(u)du 1−βRd(x_n)

0 L₀(u)du +

√

2cβ²Rd(x_n)

0 L0(u)du 1−βRd(x_n)

0 L₀(u)du .

Settingn = 0above, we getkx₁−pk ≤q₀kx₀−pk.Hence,x₁ ∈U(p, ρ),this shows that (1.1) is well defined and by inductionxn ∈ U(p, ρ)for alln.Furtherd(x) = kxn−pkis decreasing monotonically. Therefore, we have

kxn+1−pk ≤ βRd(x_n)

0 L(u)udu d(x_n)²(1−βRd(x_n)

0 L₀(u)du) d(xn)²

+

√

2cβ²Rd(x₀)

0 L0(u)du d(x_n)(1−βRd(x₀)

0 L₀(u)du) d(xn)

≤ βRd(x0) 0 L(u)du d(x0)²(1−βRd(x₀)

0 L0(u)du)

d(xn)²+

√2cβ²Rd(x0)

0 L0(u)du d(x0)(1−βRd(x₀)

0 L0(u)du) d(xn).

In particular, ifc= 0,we obtain kxn+1−pk ≤ βRd(x0)

0 L(u)du d(x₀)²(1−βRd(x₀)

0 L₀(u)du)

d(xn)²= q0

d(x₀)kxn−pk².

Concerning the uniqueness of the solutionpwe can show:

Proposition 3.1. Supposepsatisfies (1.1),F has a continuous derivative inU(p, ρ), F⁰(p)has full rank andF⁰satisfies the center Lipschitz condition withL0average. Letρ >0satisfy (3.15) µ(L0) = β

ρ Z ρ

0

L0(u)(ρ−u)du+cβ0

ρ Z ρ

0

L0(u)du≤1, wherec, βare given in (3.12) and

(3.16) β₀=k[F⁰(p)^TF⁰(p)]⁻¹k.

Then, problem (1.1) has a unique solutionpinU(p, ρ).

Proof. Letx0∈U(p, ρ), x06=pis also a solution of (1.1). Then we have (3.17) [F⁰(p)^TF⁰(p)]⁻¹F⁰(x0)^TF(x0) = 0.

(6)

Hence,

x0−p = x0−p−[F⁰(p)^TF⁰(p)]⁻¹F⁰(x0)^TF(x0)

= [F⁰(p)^TF⁰(p)]⁻¹F⁰(x0)^T[F⁰(p)(x0−p)−F(x0) +F(p)]

+[F⁰(p)^TF⁰(p)]⁻¹(F⁰(x0)^T −F⁰(x0)^T)F(x0)

= [F⁰(p)^TF⁰(p)]⁻¹F⁰(x0)^T Z t

0

[F⁰(p)−F⁰(p+t(x0−p)](x0−p)dt +[F⁰(p)^TF⁰(p)]⁻¹(F⁰(x₀)^T −F⁰(x₀)^T)F(x₀),

where0≤t≤1.So, we have

kx0−pk ≤ k[F⁰(p)^TF⁰(p)]⁻¹F⁰(x0)^Tk Z t

0

k[F⁰(p)−F⁰(p+t(x0−p)]kk(x0−p)kdt +k[F⁰(p)^TF⁰(p)]⁻¹kkF⁰(x0)^T −F⁰(x0)^TkkF(x0)k

≤ β Z 1

0

Z td(x0)

0

L0(u)dud(x0)dt+cβ0

Z d(x0)

0

L0(u)du

= β

Z d(x0)

0

L₀(u)(d(x₀)−u)du+cβ₀ Z d(x0)

0

L₀(u)du.

FromL(u) >0 and Lemma 2.4, we have ¹_tRt

0L₀(u)duis increasing monotonically with respect tot.Therefore, by (3.15) we obtain

kx0−pk ≤ β Z d(x₀)

0

L0(u)(d(x0)−u)du+cβ0

Z d(x₀)

0

L0(u)du

≤ βd(x0) d(x₀)

Z d(x0)

0

L0(u)(d(x0)−u)du+cβ0d(x0) d(x₀)

Z d(x0)

0

L0(u)du

< βd(x₀) ρ

Z ρ

0

L₀(u)(ρ−u)du+cβ₀d(x₀) ρ

Z ρ

0

L₀(u)du

≤ kx₀−pk.

The optimality of radius

Theorem 3.2. Suppose that the equality sign holds in the inequality (3.10) in the Theorem 3.1.

Then the valueρof the convergence ball is the best possible, provided thatL₀=L.

Proof. The value ofρis specified by equation

(3.18) βRρ

0 L(u)udu ρ(1−βRρ

0 L₀(u)du)+

√2cβ²Rρ

0 L0(u)udu ρ(1−βRρ

0 L₀(u)du) = 1,

there existsF satisfying (3.5) inU(p, ρ)andx0 on the boundary of the closed ball such that GN fails. In fact, the following is an example on the scaled case:

(3.19) F(x) = (

p−x+βRx−p

0 (x−p−u)L(u)du, p≤x < p+ρ⁰ p−x+βRx−p

0 (x−p+u)L(u)du, p−ρ≤x < p, andx0=p+ρ, xn =p+ (−1)ⁿρ.

Theorem 3.3. Suppose that the equality sign holds in the inequality (3.15) in the Proposition 3.1.

Then the value of the convergence ball is the best possible.

(7)

Proof. Note that whenρis given by

(3.20) β

ρ Z ρ

0

L0(u)(ρ−u)du+cβ0

ρ Z ρ

0

L0(u)du= 1

there existF satifying Definition 3.1 inU(p, ρ)andx⁰on the boundary of the closed ball such thatF⁰(x⁰) = min{¹₂F⁰(x)^T, F⁰(x)}.For example consider (3.19) withx⁰=p+ρ.

Remark 3.1. IfL₀=L=L₁,we obtain the results in [7] which in turn generalized earlier results in [11, 13, 14] when these functions are constants or not. Moreover, ifL=L₁,then the results reduce to the ones obtained by us in [4–6]. Otherwise, i.e., ifL₀ < L < L₁(or L < L0 < L1)(see [4–6] for examples), then we obtain a larger radius of convergence, tighter error bounds on the distanceskxn−pkand an at least as precise information on the location of the solutionp,since

λ(L0, L)< λ(L0, L1)< λ(L1, L1) q0(L0, L)< q0(L0, L1)< q0(L1, L1) and

µ(L₀)< µ(L)< µ(L₁).

These advantages are obtained under the same computational cost, since in practice the computation of functionL1requires the computation of functionL0andLas special cases.

REFERENCES

[1] Amat, S., Hern´andez, M. A. and Romero, N.,Semilocal convergence of a sixth order iterative method for quadratic equations, Applied Numerical Mathematics,62(2012), 833–841

[2] Argyros, I. K., Cho, Y. J. and Hilout, S.,Numerical methods for equations and its applications, CRC Press, Taylor and Francis, New York, 2012

[3] Argyros, I. K., Cho, Y. J. and George, S.,Local convergence for some third-order iterative methods under weak conditions, J. Korean Math. Soc.,53(2016), No. 4, 781–793

[4] Argyros, I. K. and Magre ñán, A. A.,Iterative Algorithms I, Nova Science Publishers Inc., New York, 2016 [5] Argyros, I. K. and Magre ñán, A. A.,Iterative methods and their dynamics with applications, CRC Press, New

York, 2017

[6] Argyros, I. K. and Magre ˜n´an, A. A.,Iterative Algorithms II, Nova Science Publishers Inc., New York, 2017 [7] Chen, J. and Li, W.,Convergence of Gauss-Newton’s method and uniqueness of the solution, Appl. Math. Comput.,

170(2005), 687–705

[8] Haubler, W. M.,A Kantorovich convergence analysis for the Gauss-Newton method, Numer. Math.,48(1986), 119–125

[9] Magre ˜n´an, A. A.,Different anomalies in a Jarratt family of iterative root finding methods, Appl. Math. Comput., 233(2014), 29–38

[10] Magre ˜n´an, A. A.,A new tool to study real dynamics: The convergence plane, Appl. Math. Comput.,248(2014), 29–38

[11] Ortega, M. and Rheinboldt, W. C.,Iterative solution of nonlinear equations in several variables, Academic Press, New York, 1970

[12] Stewart, G. W.,On the continuity of the generalized inverse, SIAM J. Math.,17(1960), 33–45

[13] Smale, S., Newton’s method estimates from data at one pointin: R Ewing, K. Gross, C. Martin (Eds). The merging of disciplines: New directions in Pure Applied and Computational Mathematics, Springer, New York, 1886, 185–196

[14] Traub, J. F. and Wozniakowski, H.,Convergence and complexity of Newton iteration, J. Assoc. Comput. Math., 29(1979), 250–258

[15] Wang, X.,Convergence of Newton’s method and uniqueness of the solution of equations in Banach space, IMA J.

Numer. Anal.,20(2000), 123–134

DEPARTMENT OFMATHEMATICALSCIENCES CAMERONUNIVERSITY

LAWTON, OK 73505, USA

E-mail address:[email protected]

(8)

DEPARTMENT OFMATHEMATICSEDUCATION AND THERINS GYEONGSANGNATIONALUNIVERSITY

JINJU660-701, KOREA

DEPARTMENT OFMATHEMATICAL ANDCOMPUTATIONALSCIENCES NIT KARNATAKA, INDIA-575 025