Inexact-Newton methods for semismooth systems of equations
with block-angular structure
Natasa Krejica;∗;1, Jose Mario Martnezb;2
aInstitute of Mathematics, University of Novi Sad, Trg Dositeja Obradovica 4, 21000 Novi Sad, Yugoslavia b
Department of Mathematics, IMECC-UNICAMP, University of Campinas, CP 6065, 13081-970 Campinas SP, Brazil
Received 31 August 1997; received in revised form 3 October 1998
Abstract
Systems of equations with block-angular structure have applications in evolution problems coming from physics, en-gineering and economy. Many times, these systems are time-stage formulations of mathematical models that consist of mathematical programming problems, complementarity, or other equilibrium problems, giving rise to nonlinear and non-smooth equations. The nal versions of these dynamic models are nonnon-smooth systems with block-angular structure. If the number of state variables and equations is large, it is sensible to adopt an inexact-Newton strategy for solving this type of systems. In this paper we dene two inexact-Newton algorithms for semismooth block-angular systems and we prove local and superlinear convergence. c1999 Elsevier Science B.V. All rights reserved.
Keywords:Semismooth equations; Nonlinear systems; Inexact-Newton methods; Decomposition
1. Introduction
Let n be a (large) positive integer and let F be a mapping F:Rn→Rn. In this paper we consider nonlinear system of equations
F(x) = 0;
where the n equations and variables can be grouped into m blocks,
F= (F1; F2; : : : ; Fm); x= (x1; x2; : : : ; xm);
∗Corresponding author. E-mail: [email protected].
1
This author was supported by FAPESP (Grant 96=8163-9).
each of size ni,
xi∈Rni for i= 1;2; : : : ; m;
where of coursen1+n2+· · ·+nm=n and where the variables appear in the following block-angular way:
F1(x1) = 0;
F2(x1; x2) = 0; (1)
.. .
Fm(x1; x2; : : : ; xm) = 0:
In other words, the variables that appear in each block of nonlinear equations are the variables that appear in the preceding block plus one new block of variables,
Fi:Rn1× · · · ×Rni→Rni
for i= 1;2; : : : ; m.
Such systems arise in many industrial applications where decisions are taken on the basis of mathematical models that involve constrained function minimization, complementarity problems or variational inequalities. Usually, these models are dynamic in the sense that their solutions for one period of time provide essential data for solving the model at the next period.
For example, the production side of an economy is described in Ferris and Pang [7] as a nonlinear complementarity problem. Each sector of an economy produces a certain amount of goods and is constrained with technology. In a competitive marketplace, the jth sector makes the prot by solving the optimization problem which involves the price vector and technology constraints.
However, this model does not take into account that decisions in economic systems related to boundary conditions are aected by the decisions at previous stages of time. If such a decision is represented by the constrained maximizer of a functional, the natural tendency is to follow continuous trajectories, even if this represent to change global maximizers by local ones.
Let us represent the original model given in [7] by
H(a; y)¿0; y¿0; hH(a; y); yi= 0
with y∈Rp, a∈Rq, where y represents the decision and a a vector of boundary conditions. Making explicit the dependence in relation to time, this should be written
H(ai; yi)¿0; yi¿0; hH(ai; yi); yii= 0:
Finally, including a penalization that inhibits large changes on yi, the model turns out to be
H(ai; yi) +(yi−yi−1)¿0; yi¿0; hH(ai; yi) +(yi−yi−1); yii= 0 (2)
In (1) we assume that the problem at periodiis a system of equations whose solution is xi. In fact, complementarity problems, variational inequality problems and optimality conditions of constrained minimization can be represented by systems of equations. Unfortunately, these systems are not smooth, but they are semismooth in the following sense.
For a locally Lipschitzian function F; Rademacher’s theorem (see [1]), implies that F is almost everywhere F-dierentiable. Let DF be a set of points where F is F-dierentiable. Then for any
x∈Rn,
@BF(x) ={lim3F(xj) :xj→x; xj∈DF}
is a nonempty compact set. The generalized derivative, @F(x); is the convex hull of @BF(x). We say
that F is semismooth at x if F is locally Lipschitizian and for all h∈Rn with h6= 0
lim
y→hx{Vh: V∈@F(y)}
exists. If F is semismooth at all points in a given set we say that F is semismooth in this set. Furthermore, if all elements in@BF(x) are nonsingular we say thatF is BD-regularatx. Such systems were considered in [9–11, 13] and other authors. Smooth systems of equations with the block-angular structure (1) have been studied in [5, 3, 8]. In [3] Newton-like methods were considered, that require the solutions of m linear systems of dimension ni×ni at each iteration. When the number of state variables ni is large, direct solution of linear systems can be prohibitive and, so, an inexact-Newton approach (using iterative linear solvers) is better. In [8] the inexact-Newton approach was studied related to the smooth case. In the present paper we apply inexact-Newton ideas to the semismooth block-angular structure (1). Inexact-Newton methods for semismooth nonstructured systems were proposed in [9].
This paper is organized as follows. In Section 2 we introduce some notation, we state basic assumptions and a technical lemma. Two generalized block inexact-Newton methods are dened in Section 3 and local convergence results are proved in Section 4. In Section 5 we draw some conclusions.
2. Preliminaries
We assume, as in [9], that F is locally Lipschitzian, semismooth and BD-regular at x∗ and that this point is a solution of the system. This implies that there exists an open convex neighborhood D of x∗ such that to each x= (x1; : : : ; xm)∈D we can associate (@BF11(x); : : : ; @BFmm(x)) such that
(a) For all x∈D; i= 1; : : : ; m; V∈@BFii(x), V is a nonsingular ni×ni matrix. The norms of inverses of all these matrices admit a common bound M. In fact (@BF11(x); : : : ; @BFmm(x)) is a (block-) diagonal part of@BF(x) and each@BFii represents the generalized partial derivative ofFi with respect to xi.
(b) Given an arbitrary norm | · |, for all x∈D, i= 1; : : : ; m we have
lim h→0
|Fi(x1; : : : ; xi−1; xi+h)−Fi(x1; : : : ; xi−1; xi)−Vih| |hi|
= 0; (3)
Properties (a) and (b) are trivial consequences of the BD-regularity and the semismooth-ness of F, respectively. It is important to observe that we will use only these properties in our convergence proofs, so the class of nonsmooth systems which can be solved with the here proposed methods is wider than BD-regular and semismooth systems. In fact, BD-regularity and semismoothness are just sucient conditions for the hypotheses of our convergence theory.
We will use the same notation as in the papers [3, 8]. The methods introduced in this paper will be iterative and the successive approximations to the solutions will be denoted xk; k= 0;1;2; : : : . According to the structure of the systems, it will be useful to divide the vector xk into m (block-) components xk
i, with each xik belonging to Rni; i= 1;2; : : : ; m, so
xk= (x1k; x2k; : : : ; xmk):
Many times we will use the vector whose components are the rst i−1 components of xk+1
followed by the ith component of xk, so we dene xk;1=xk
Also, for x∈Rn, we will need to refer to the vector whose components are the rst i components of x. This vector, which belongs to Rn1× · · · ×Rni, will be denoted x
i. So, if x= (x1; : : : ; xm), we have
xi= (x1; : : : ; xi); i= 1;2; : : : ; m:
With this notation, we can write
xk; i= ( xik−+11; x
We nish this section by stating a lemma on nite dierence recurrences which will be used in convergence proofs.
Proof. The proof was given in [8]. Let us sketch it here for the sake of completeness. Dene the matrix A∈Rn×n
, A= [aij], by
aij=
1; i=j;
−C; i¿j; 0; i¡j:
Clearly (4) is equivalent to
Aek+16ek:
Since the entries of A−1
are nonnegative, this implies that
ek+16A−1
ek (5)
for all k. Since the spectral radius of A−1 is equal to 1, there exists a norm k · k
C on Rn such that, given 0∈(;1), we have
kA−1k
C60:
Moreover, since A−1
is lower triangular, we have kxkC=kDxk2 where D is a diagonal matrix. So,
by (5),
kek+1kC60kekkC
and the thesis follows from this inequality.
3. Semismooth block inexact-Newton methods
In this section we will dene two block inexact-Newton methods based on generalized diagonal derivatives of F. One iteration of each method will consist of m steps. At each step an increment xk+1
i −xik will be computed from the generalized Newtonian equation on the ith block using an inexact stopping criterion. In this way we exploit both the block-angular structure of the system as in the smooth case analyzed in [3, 8] and the semismoothness of the system studied in [9].
The rst method is based on the stopping criterion introduced in [2]. Assume that x0 is an
arbi-trary initial approximation to the solution of the nonlinear system (1). Given the kth approximation xk= (xk
1; : : : ; xmk) and the forcing parameter k, the rst semismooth block inexact-Newton (SBIN1) algorithm obtains xk+1= (xk+1
1 ; : : : ; xmk+1) by means of
Vik(xik+1−xik) =−Fi(xk; i) +rik; (6)
where Vk
i ∈@BFii(xk; i) and |rk
i |6k|Fi(xk; i)| (7) for i= 1; : : : ; m.
The increment xk+1
The second semismooth block-inexact-Newton (SBIN2) method is based on Ypma’s stopping criterion [15] for the iterative linear solver. In this method xk+1 is obtained by means of
xik+1−xik=−(Vik)−1F
i(xk; i) +rik; (8) where Vk
i ∈@BFii(xk; i) and |rik|6k|(Vik)
−1F
i(xk; i)|: (9) SBIN1 is not ane invariant in the range space, while SBIN2 is. This is the reason why, in the one-block case studied in [9], the convergence result is stronger for the method based on Ypma’s criterion. We will see that this is also the case in our block generalization. Namely, for SBIN1 we will obtain local convergence when all the forcing parameters k are smaller than some unknown upper bound ∈(0;1), while for SBIN2 local convergence follows when all the k’s are smaller than any given upper bound in (0;1). On the other hand, Ypma’s criterion is not computable since in order to verify (9) we need an estimation of the exact solution of the newtonian linear system. Sometimes this estimation can be provided by the linear solver (see [4, 9]).
4. Local convergence
In this section we will give local convergence results for SBIN1 and SBIN2. As mentioned before we are going to prove that, if the forcing parameters k are bounded by some (unknown) ¡1 (SBIN1) or any (given) ¡1 (SBIN2) and the initial approximation is close to the solution then
{xk} converges to x∗ with an q-linear rate determined by in a suitable norm. In the case that the forcing parameters tends to 0, q-superlinear convergence will be proved.
Theorem 1. Let F satisfy the basic assumptions stated in Section 2. There exist ¿0 and a
neighborhood ofx∗ such that if06k6 for allk≡0;1;2; : : : ;andx0 belongs to that neighborhood;
then the sequence {xk} generated by SBIN1 converges q-linearly (in a suitable norm) to x∗.
Proof. Since F satises the basic assumptions, properties (a) and (b) of Section 2 guarantee that there exist constants ; M1 and ”1¿0 such that if kxi−x∗i k6”1, Vi∈@BFii( xi) then Vi is nonsingular and
|V−1
i |6M1; |Fi( xi)|6kxi−x∗i k for all i= 1;2; : : : ; m:
Moreover, by (3) we have that for all ¿0 there exists ”2∈(0; ”1) such that whenever
x∈N(x∗)≡ {x: kx−x∗k6”2} and Vi∈@BFii( xi);
then
|Fi( x∗i−1; xi)−Fi( x
∗
i )−Vi(xi−x∗i )|6|xi−x∗i |:
Choose
¡ 1
M1
and ¡ 1 M1
and denote =M1(+). Clearly ¡1. Let us denote eik=|xik−x∗i |, i= 1;2; : : : ; m, k=0;1;2: : :.
and the convergence is q-linear.
Theorem 2. Let F satisfy the basic assumptions of Section 2 and let ∈(0;1) be given. If the
sequence {k} is bounded by then there exists a neighborhood of x∗ such that for any x0 in this
neighborhood the sequence {xk} generated by SBIN2 converges q-linearly (in a suitable norm)
We have seen that the convergence rate is as usually determined by the upper bound for the forcing parameters. Notice that q-linear convergence in the norm described in Lemma 1 implies r-linear convergence in any other norm. For m= 1, q-superlinear convergence when the forcing parameters tend to zero was proved in [9]. Our next theorem states q-superlinear convergence of both block inexact methods.
Theorem 3. LetF satisfy the basic assumptions of Section 2. Assume that the sequence{k}tends
to 0 and that the hypotheses of Theorems 1 and 2 hold. Then the sequences generated by SBIN1
and SBIN2 converge q-superlinearly.
Proof. Let us consider SBIN2 and keep the notation of Theorem 2 (the proof for SBIN1 is analo-gous). Dene
k;i=
|Fi( x∗i−1; xik)−Fi( x∗i )−Vik(xik−x∗i )| |xk
i −x∗i |
; i= 1; : : : ; m:
By Property (b) of Section 2, we have that limk→∞k;i= 0. Denoting k= max16i6mk;i; and using inequalities (10)–(13) we obtain
ek+1
1 6ke1k;
ek+1
i 6keik+C i−1
X
j=1
ek+1
i
with k=M2k+k(1 +M2k) and C=M2(1 +); 6k. Clearly limk→∞k= 0; so
|xk+1
1 −x∗1|= o(|x1k−x∗1|):
Assume, as inductive hypothesis,
|xjk+1−xj∗|= o(kxjk−xj∗k); j= 1; : : : ; i−1:
Then
|xk+1
i −x∗i |6k|xik−x∗i |+C i−1
X
j=1
o(kxjk−xj∗k)
= o(kxik−x∗i k);
so
kxk+1−x∗k= m X
i=1
|xik+1−x∗i |= o(kxk−x∗k)
5. Final remarks
The main objective of this research is the development and analysis of mathematical models applied to social and economic growth and development, regarding not only their adequacy to reality but also their solid fundamentals from the mathematical point of view.
We found that sequences of time-dependent large nonlinear systems of nonsmooth equations are useful models for representing a large variety of practical economic and environmental problems. For this reason, and taking into account previous research in [3, 8] concerning block-angular smooth systems and Martnez and Qi [9] concerning nonstructured nonsmooth systems, we decided to intro-duce the natural inexact-Newton methods for that structure, and to justify these methods from the local convergence point of view. As in the case of nonstructured systems studied in [9], we found that the Dembo–Eisenstat–Steihaug stopping criterion (now applied to each block of equations) does not provide a convergence result of the same quality as the one it gives for smooth systems. On the other hand, the block Ypma’s stopping criterion allows us to prove a stronger result of local convergence.
Proving global convergence using a combination of the techniques of [8] for m-block smooth systems and [9] for one-block systems should not be dicult. However, in the general case, the descent condition for the Newton-like direction required in [9] is not easy to verify. Therefore, it will be probably more useful to develop global algorithms tting with the local theory developed in this paper for specic nonsmooth situations, such as sequences of nonlinear complementarity problems. In this case, the Fischer–Burmeister merit function used in Facchinei and Kanzow [6] provides, very likely, the most useful tool for that purpose.
Acknowledgements
The authors are indebted to an anonymous referee whose comments helped to improve the read-ability of this paper.
References
[1] F.H. Clarke, Optimization and Nonsmooth Analysis, Wiley, New York, 1983.
[2] R.S. Dembo, S.C. Eisenstat, T. Steihaug, Inexact Newton methods, SIAM J. Numer. Anal. 19 (1982) 400 – 408. [3] J.E. Dennis, J.M. Martnez, X. Zhang, Triangular decomposition methods for solving reducible nonlinear systems of
equations, SIAM J. Optim. 4 (1994) 358–382.
[4] P. Deuhard, R. Freund, A. Walter, Fast secant methods for the iterative solution of large nonsymmetric linear systems, Impact Comput. Sci. Eng. 2 (1990) 244 –276.
[5] J. Erickson, A note on the decomposition of systems of sparse non-linear equations, BIT 16 (1976) 462– 465. [6] F. Facchinei, S. Kanzow, A nonsmooth inexact Newton method for the solution of large-scale nonlinear
complementarity problems, Math. Programming 76 (1997) 493–512.
[7] M.C. Ferris, J.S. Pang, Engineering and economic applications of complementarity problems, Technical Report 95-07, Computer Science Department, University of Winsconsin, Madison, WI, May 1995.
[8] N. Krejic, J.M. Martnez, A globally convergent inexact-Newton method for solving reducible nonlinear systems of equations, Technical Report 49=97, IMECC-UNICAMP, Campinas, Brazil, 1997.
[10] R. Miin, Semismooth and semiconvex functions in constrained optimization, SIAM J. Control Optim. 15 (1977) 957–972.
[11] J.S. Pang, L. Qi, Nonsmooth equations: Motivation and Algorithms, SIAM J. Optim. 3 (1993) 443– 465.
[12] L. Qi, Convergence analysis of some algorithms for solving nonsmooth equations, Math. Oper. Res. 18 (1993) 227–244.