• Tidak ada hasil yang ditemukan

Computational Aspects of Solving Inverse Problems

5. Generalized Inversion and Nonlinear Optimization

5.3. Computational Aspects of Solving Inverse Problems

There are many ways to solve the problem Ax = b. For example, one learns Cramer’s rule and Gaussian elimination in high school. We also know that by applying the least squares method, we can reduce any over- or underdetermined system of equations to an even-determined one. With digital computers generally available, one wonders why the machines cannot simply grind out the answers and let the scientists write up the results. Unfortunately, this point of view has led many scientists astray.

5.3. Solving Inverse Problems 117 As pointed out by Hamming (1962, front page), “the purpose of comput- ing is insight, not numbers.”

Using computers blindly would not lead us anywhere. Suppose we wish to solve a system of 20 linear equations. Although it is possible to program Cramer’s rule to do the job on a modem computer, it would take at least 300 million years to grind out the solution (Forsytheet al., 1977, p. 30). On the other hand, using Gaussian elimination would take less than 1 sec on a modem computer. However, if the matrix were ill-conditioned, Gaussian elimination would not give us a meaningful answer. Worse yet, we might not be aware of this.

In this section, we first review briefly the nature of computer computa- tions. We then summarize some computational aspects of generalized in- version.

5.3.1. Nature of Computer Computations

Although we are accustomed to real numbers in mathematical analysis, it is impossible to represent all real numbers in computers of finite word length. Thus, arithmetic operations on real numbers can only be approxi- mated in computer computations. Nearly all computers use floating-point numbers to approximate real numbers. A set of floating-point numbers is characterized by four parameters: the number base

p ,

the precision t , and the exponent range [pl, p 2 ] (see, e.g., Forsythe ef d., 1977, p. 10). Each floating-point number x has the value

(5.51)

where the integers dl,

. . .

, dt satisfy 0 5 di I:

p

- 1 ( i = 1,

. . .

, t), and the integer p has the range: p1 I: p 4 p 2 .

In using a computer, it is important to know the relative accuracy of the arithmetic (which is estimated by PI-‘ and the upper and lower bounds of floating-point numbers (which are given approximately by /Y2 and p”l, respectively). For example, the single precision floating-point numbers for

IBM

computers are specified by p = 16, f = 6. p1 = -65, and p2 = 63.

Thus, the relative accuracy is about the lower bound is about lo-’’, and the upper bound is about 10”. The

IBM

double precision floating- point numbers have t = 14, so that the relative accuracy is about 2 X

but they have the same exponent range as their single-precision counterparts.

The set of floating-point numbers in any computer is not a continuum, or even an infinite set. Thus, it is not possible to represent the continuum of real numbers in any detail. Consequently, arithmetic operations (such as addition, subtraction, multiplication, and division) on floating-point

x = ?(d,/P

+

d,/pz

+ . . . +

d,/p‘)pp

118 5. h i version crnd Optimizutioti

numbers in computers rarely correspond to those on real numbers. Since floating-point numbers have a finite number of digits, roundoff error oc- curs in representing any number that has more significant digits than the computer can hold. An arithmetic operation on floating-point numbers may introduce roundoff error and can result in underflow or overflow error. For example, if we multiply two floating-point numbers s and .v.

each off significant digits, the product is either 2r or (21 - I ) significant digits, and the computer must round off the product to t significant digits.

If x and y has exponents of pr and p v , it may cause an overflow if (p,

+

p u ) exceeds the upper exponent range allowed, and may cause an underflow if (pr

+

pJ is smaller than the lower exponent range allowed.

An effective way to minimize roundoff errors is to use double precision floating-point numbers and operations whenever in doubt. Unfortunately, the exponent range for single precision and double precision floating-point numbers for most computers is the same, and one must be careful in handling overflow and underflow errors. In addition, there are many other pitfalls in computer computations. Readers are referred to textbooks on computer science, such as Dahlquist and Bjorck (1974) and Forsythe ef ul.

(19771, for details.

5.3.2.

The purpose of generalized inversion is to help us understand the nature of the problem Ax = b better. We are interested in getting not only a solution to the problem Ax = b, but also answers to the following ques- tions: (1) Do we have sufficient information to solve the problem? (2) Is the solution unique? (3) Will the solution change a lot for small errors in b and/or A? (4) How important is each individual observation to our solu- tion?

In Section 5.2 we solved our problem Ax = b by seeking a matrix H which serves as an inverse and satisfies certain criteria. The singular value decomposition described in Section 5.1 permits us to construct this in- verse quite easily. Let A be a real m X n matrix. The n X m matrix H is said to be the Moore-Penrose generalized inverse of A if H satisfies the following conditions (Ben-Israel and Greville, 1974, p. 7):

Computatwnal Aspects of Generalized Inversion

AHA = A , HAH = H, (AH)T = AH (HA)T = HA

(5.52)

The unique solution of this generalized inverse of A is commonly denoted as At. It can be verified that if we perform singular value decomposition

5.3. Solving Inverse Problems 119 on A [see Eqs. (5.39)-(5.41)1, i.e.,

(5.53) A = USVT

then by the properties of orthogonal matrices,

where S t is an n x m matrix,

and for i = I ,

. . .

, n,

(5.56) =

{

lri for mi > 0

for mi = 0

It is straightforward to find the resolution matrix R and the information density matrix D. Since R

=

HA = A'A, we have

(5.57a) R = VS'UTUSVT = VS'SVT

because U W = I by Eq. (5.40). Let us consider the overdetermined case where m > n . If the rank of matrix A is r ( r I n ) , then there are r nonzero singular values. If we denote the r X r identity matrix by I,, then by Eqs.

(5.41) and ( 5 . 5 3 , (5.57b)

Hence, (5.57c)

s's

=

;

n - r

Y v

r n - r

R = V,VT

where V, is the first r columns of V corresponding to the r nonzero singu- lar values. Similarly, from D = AH = AA', we have

(5.57d) D = USVTVS'UT = U,UT

where U, is the m X r submatrix of U corresponding to the r nonzero singular values.

The resulting estimate of the solution

120 5 . Itiversion titid Optimizcition

(5.58) i = A'b

is always unique. If matrix A is of full rank (i.e., r = n), then R = 1, and D

= I,. Thus, i corresponds to the usual least squares solution. If matrix A is rank deficient (i.e., r < n), then there is no unique solution to the least squares problem. In this case, we must choose a solution by some criteria.

A simple criterion is that the solution 2 has the least vector length, and Eq.

(5.58) satisfies this requirement. Finally, we introduce the unscaled covariance matrix C (Lawson and Hanson, 1974, pp. 67-68)

(5.59) C

=

V(St)?VT

The Moore-Penrose generalized inverse is one of many generalized inverses (Ben-Israel and Greville, 1974). In other words, there are other choices for the matrix H, which can serve as an inverse, and thus one can obtain different approximate solutions to the problem Ax = b. For exam- ple, in the method of damped least squares (or ridge regression), the matrix H is chosen to be

(5.60a) H = VFStUT

In this equation, F is an n X n diagonal filter matrix whose components are (5.60b)

where 13 is an adjustable parameter usually much less than the largest singular value crl.

The effect of this

H

as the inverse is to produce an estimate i whose components along the singular vectors corresponding to small singular values are damped in comparison with their values obtained from the Moore-Penrose generalized inverse solution. This i can be shown (Law- son and Hanson, 1974) to solve the problem

Fii = c.:/(a:

+ e')

for i = 1, 2,

. . .

, n

( 5 . 6 0 ~

and thus represents a compromise between fitting the data and limiting the size of the solution. Such an idea is also used in the context of nonlinear least squares where it is known as the Levenberg-Marquardt method (see Section 5.4.3).

Unlike the ordinary inverse, the Moore-Penrose generalized inverse always exists, even when the matrix is singular. In constructing this gen- eralized inverse, we are careful to handle the zero singular values. In Eq.

(5.56), we define cr: = I/af only for crl

>

0, and cr; = 0, for cf = 0. This avoids the problem of the ordinary inverse, where cr;' = l / u f . Further- more, singular values give insight to the rank and condition of a matrix.

The usual definition of rank of a matrix is the maximum number of

5.3. Solving Inverse Problems 121