• Tidak ada hasil yang ditemukan

Mathemutical Treatment of Linear Systems 107 the overdetermined set of equations into an even-determined one by the

5. Generalized Inversion and Nonlinear Optimization

5.1. Mathemutical Treatment of Linear Systems 107 the overdetermined set of equations into an even-determined one by the

least-squares method. Thus on the surface, one may think thatn equations are sufficient to determine n unknowns. Unfortunately, this is not always true, as illustrated by the following example:

(5.3) x,

+

x,

+

x3 = I , - x,

+

x3 = 3 2x,

+

2.r,

+

2x3 = 2

These three equations are not sufficient to determine the three unknowns because there are actually only two equations, the third equation being a mere repetition of the first. Readers may notice that the determinant for the system of equations in Eq. (5.3) is zero, and hence the system is singular and no inverse exists. However, the solvability of a linear system depends on more than the value of the determinant. For example, a small value for a determinant does not mean that the system is difficult to solve.

Let us consider the following system, in which the off-diagonal elements of the matrix are zero:

The determinant is 10-'o", which is very small indeed. But the solution is simple, i.e., x1 = x2 = .

. .

= xloo = 10. On the other hand, a reasonable determinant does not mean that the solution is stable. For example, ( 5 . 5 )

leads to a simple solution of x =

(A),

and the determinant is 1. However, if we perturb the right-hand side to b =

(::A),

the solution becomes x =

(-A:!;).

In other words, if b, is changed by 5%, the solution is entirely different.

5.1.1.

The preceding discussion clearly demonstrates that solving n linear equations for n unknowns is not straightforward, because we need to know some critical properties of the matrix A . We now present an analysis of the properties of A , and our treatment here closely follows Lanczos (1956, pp. 52-79). We first write Eq. (5.1) in reversed sequence as

(5.6) b = AX

Analysis of an Even-Determined System

108 5. lnversion and Optimization

Geometrically, we may consider both b and x as n-dimensional vectors in an even-determined system. Equation (5.6) says that multiplication of a vector x by the matrix A generates a new vector b, which can be thought of as a transformation of the original vector x. Let us investigate the case where the new vector b happens to have the same direction as the original vector x. In this case, b is simply proportional to x and we have the condition

(5.7) Ax = AX

Equation (5.7) is really a set of n homogeneous linear equations, i.e., (A -

AI) x = 0, where I is an identity matrix. It will have a nontrivial solution only if the determinant of the system is zero, i.e.,

This determinant is a polynomial of order n in A, and thus Eq. (5.8) leads to the characteristic equation

(5.9) A”

+

c,-lA’*-’

+

c , - ~ A ” - ~

+

*

. . +

c0 = 0

In order for A to satisfy this algebraic equation, A must be one of its roots.

Since an nth order algebraic equation has exactly n roots, there are exactly n values of A, called the eigenvalues of the matrix A, for which Eq.

(5.7) is solvable. We assume that these eigenvalues are distinct (see Wil- kinson, 1965, for the general case), and write them as

(5.10) A = A1, As.

. . .

, An

To every possible A = A,. a solution of Eq. (5.7) can be found. We may tabulate these solutions as follows

A = A,:

A = AS:

x = (xi’’,

~6’’. . . .

, x:’)~

x = (xi2’, xi2’,

. . .

, x;’’)~

(5.1 I )

A = A,: x = ($’, x?’,

. . .

, x‘,“’)~

where the superscript ( j ) denotes the solution corresponding to A,, and the superscript T denotes the transpose of a vector or a matrix. These solu- tions represent n distinct vectors of the n-dimensional space, and they are

5.1. Mathematical Treatment of Linear Systems 109 called the eigenvectors of the matrix A. We may denote them by ul, u2,

. . .

, u,, where

u1 = (xi”, xp,

. . .

, xL”)T

u2 = (xi?’, xp,

. . .

, xi,2’)T

(5.12)

un = (x:“), x y ,

. . .

, x, I n ) ) T

Because Eq. (5.7) is valid for each A = A], j = 1,

. . .

, n , we have (5.13) A u ~ = Ajuj, j = I ,

. . .

, n

In order to write these n equations in matrix notation, we introduce the following definitions:

(5.14) A =

(5.15)

In other words, A is a diagonal matrix with the eigenvalues of matrix A as diagonal elements, and U is an n X n matrix with the eigenvectors of matrix A as columns. Equation (5.13) now becomes

(5.16) AU = U A

Let us carry out a similar analysis for the transposed matrix AT whose elements are defined by

(5.17) ATj = Aji

Because any square matrix and its transpose have the same determinant, both matrix A and its transpose AT satisfy the same characteristic equa- tion (5.9). Consequently, the eigenvalues of AT are identical with the eigenvalues of A. If we let v l , v2,

. . .

, v, denote the eigenvectors of AT and define

(5.18)

110 5. Inversion and Optimization then the eigenvalue problem for AT becomes

(5.19) ATV = VA

Let us take the transpose on both sides of Eq. (5.19)

(5.20) VTA = AVT

and let us postmultiply this equation by U

(5.21) VTAU = AVW

On the other hand, let us premultiply both sides of Eq. (5.16) by VT

(5.22) VTAU = VWA

Because the left-hand sides of Eq. (5.21) and Eq. (5.22) are equal, we obtain

(5.23) AVW = V W A

Let us denote the product V W by W, whose elements are W, and write out Eq. (5.23)

(5.24) I

= o

If it is assumed that the eigenvalues are distinct, we obtain W, = 0 for i # j . This proves that VT and U are orthogonal. Since the length of eigenvectors is arbitrary, we may choose VTU = I, where I is the identity matrix. Thus, we have

(5.25) VT = U-', V = (UT)-'; U = (V?-', UT = V-'

We are now in a position to derive the fundamental decomposition theorem. If we postmultiply Eq. (5.16) by VT, we have

(5.26) AUVT = UAVT

In view of Eq. (5.251, UVT = I, so that

(5.27) A = UAVT

Equation (5.27) shows that any n X n matrix A with distinct eigenvalues is obtainable by multiplying three matrices together, namely, the matrix U with the eigenvectors of A as columns, the diagonal matrix A with the

5.1. Mathematical Treatment of Linear Systems 111 eigenvalues of A as diagonal elements, and the matrix VT. The columns of matrix V are the eigenvectors of AT.

The solution of the eigenvalue problem Ax = Ax also solves the matrix inversion problem when A is nonsingular. If we premultiply Eq. (5.7) by A-f, the inverse of A, we obtain

(5.28) x = AA-'x

or

(5.29) A-lx = h - 1 ~

Equation (5.29) is just the eigenvalue problem of A-I. This means that both matrix A and its inverse A-' have the same eigenvectors, but their eigenvalues are reciprocal to each other. Applying the fundamental de- composition theorem [Eq. (5.2711, we obtain

(5.30) A-' = UA-'VT

Hence if we decompose A, its inverse can be easily found. We also see that if one of the eigenvalues of A, say A,, is zero, we cannot compute hi' and consequently A-' does not exist.

The preceding analysis offers some insight into the solvability of an even-determined system of linear equations. But have we really solved our problem? Calculation of eigenvalues requires finding the roots of an nth order algebraic equation. These roots, or eigenvalues, are in general complex and difficult to determine. Fortunately, the eigenvalues of a symmetric matrix are real, and this fact can be exploited not only for solving a general nonsymmetric matrix, but also for solving the general nonsquare matrix as well.

5.1.2. Analysis of an Underdetermined System

There are usually an infinity of solutions for an underdetermined sys- tem, that is, one in which the number of unknowns exceeds the number of equations. However, we must be aware that there may be no solution at all for some underdetermined systems. For instance, the following system of two equations for three unknowns:

(3.3 1 )

has no solution because the second equation contradicts the first one.

The high degree of nonuniqueness of solutions in an underdetermined system has been exploited in geophysical applications in a series of papers by Backus and Gilbert (1967, 1968, 1970). Since then it has been popular to model any property of the earth as an infinite continuum, i.e., the

x1 - x,

+

x3 = I , 2x, - 2x,

+

Zx3 = 3

112 5. Inversion und Optimization

number of unknowns is infinite. But because geophysical data are limited, the number of observations is finite. In other words, we write a finite number of equations corresponding to our observations, but our unknown vector x is of infinite dimension. In order to obtain a unique answer, the solution chosen is the one which maximizes or minimizes a subsidiary integral. In actual practice, we minimize a sum of squares to produce a smooth solution. A typical mathematical formulation may look like (5.32)

where the top block A represents the underdetermined constraint equa- tions with the observed data vector b, the vector x contains the unknowns, and the bottom block is a band matrix which specifies that some filtered version of x should vanish. As pointed out by Claerbout (1976, p. 120), the choice of a filter is highly subjective, and the solution is often very sensi- tive to the filter chosen. The Backus-Gilbert inversion is intended for analysis of systems in which the unknowns are functions, i.e., they are infinite-dimensional, as opposed to, say, hypocenter parameters in the earthquake location problem, which are four-dimensional. For an applica- tion of the Backus-Gilbert inversion to travel time data, readers may refer, for example, to Chou and Booker (1979).

5.1.3. Analysis of an Overdetermined System

Most scientists are modest in their data modeling and would have more observations than unknowns in their model. On the surface it may seem very safe, and one should not expect difficulties in obtaining a reasonable solution for an overdetermined system. However, an overdetermined sys- tem may in fact be underdetermined because some of the equations may be superfluous and do not add anything new to the system. For example, if we use only first P-arrivals to locate an earthquake which is coniiderably outside a microearthquake network, the problem is underdetermined no matter how many observations we have. In other words, in many physical situations we do not have sufficient information to solve our problem uniquely. Unfortunately, many scientists overlook this difficulty. There- fore, it is instructive to quote the general principle given by Lanczos (1961, p. 132) that “a lack of information cannot be remedied by any mathematical trickery.”

Usually a given overdetermined system is mathematically incompati- ble, i.e., some equations are contradictory because of errors in the obser- vations. This means that we cannot make all the components of the re-

5.1. Mathemutical Treatment of Linear Systems 113 sidual vector r

=

Ax - b equal to zero. In the least squares method we seek a solution in which llrlp is minimized, where llrll denotes the Eucli- dean length of r. In matrix notation, we write llr1p as (see Draper and Smith, 1966, p. 58)

(5.33) llr112 = (Ax - b)T(Ax - b) = xTATAx - 2xTATb

+

bTb

To minimize llr11*, partial differentiate Eq. (5.33) with respect to each com- ponent of x and equate each result to zero. The resulting set ofn equations may be rearranged into matrix form as

(5.34) 2ATAx - 2ATb = 0

or

(5.35) ATAx = ATb

Equation (5.35) is called the system of normal equations and is an even- determined system. Furthermore, the matrix ATA is always symmetric, and its eigenvalues are not only real but nonnegative. By applying the least squares method, we not only get rid of the incompatibility of the original equations, but also have a much nicer and smaller set of equations to solve. For these reasons, scientists tend to use exclusively the least squares approach for their problems. Unfortunately, there are two serious drawbacks. In solving the normal equations by a computer, one needs twice the computational precision of the original equations. By forming ATA and ATb, one also destroys certain information in the original system.

We shall discuss these drawbacks later.

5.1.4.

Since the determinant is defined only for a square matrix, we cannot carry out an eigenvalue analysis for a general nonsquare matrix. How- ever, Lanczos (1961) has shown an interesting approach to analyze an arbitrary system. The fundamental problem to solve is

(5.36) Ax = b

where the matrix A is m X n, i.e., A has m rows and n columns. Equation (5.36) says that A transforms a vector x ofn components into a vector b of m components. Therefore matrix A is associated with two spaces: one of m dimensions and the other of n dimensions. Let us enlarge Eq. (5.36) by considering also the adjoint system

(5.37) ATy = c

where the matrix AT is n x m, y is an m-dimensional vector, and c is an Analysis of an Arbitrary System

114 5. Inversion und Optimization

n-dimensional vector. We now combine Eq. (5.36) and Eq. (5.37) as fol- lows:

(5.38)

Now, this combined system has a symmetric matrix, and one can proceed to perform an eigenvalue analysis like that described before (for details, see Lanczos, 1961, pp. 115-123). Finally, one arrives at a decomposition theorem similar to Eq. (5.27) for a real m X n matrix A with m 2 n

(5.39) A = USVT

where

(5.40) UTU = I,, V’V = I,

and

The m X m matrix U consists of rn orthonormalized eigenvectors of AAT, and the n X n matrix V consists of n orthonormalized eigenvectors of ATA.

Matrices I, and I, are m X m and n X n identity matrices, respectively.

The matrix S is an m X n diagonal matrix with off-diagonal elements Su = Ofor i Z. j . and diagonal elements Sii = ui, where ui. i = 1. 2,

. . . .

n are

the nonnegative square roots of the eigenvalues of ATA. These diagonal elements are called singular values and are arranged in Eq. (5.41) such that

(5.42) U 1 2 U * 2 2 u , 2 0

The above decomposition is known as singular value decomposition (SVD). It was proved by J. J. Sylvester in 1889 for square real matrices and by Eckart and Young (1939) for general matrices. Most modern texts on matrices (e.g., Ben-Israel and Greville, 1974, pp. 242-251; Forsythe and Moler, 1967, pp. 5-11; G. W. Stewart, 1973, pp. 317-326) give a derivation of the singular value decomposition. The form we give here follows that given by Forsythe et d. (1977, p. 203). Lanczos (1961, pp.

120-123) did not introduce the singular values explicitly, but used the term

5.2. Piivsical Consideration of Inverse Problems 115 eigenvalues rather loosely. Consequently, some works of geophysicists who use Lanczos’ notation may be confusing to readers. Strictly speak- ing, eigenvalues and eigenvectors are not defined for an m x n rectangular matrix because eigenvalue analysis can be performed only for a square matrix.