• Tidak ada hasil yang ditemukan

Matrices

N/A
N/A
Protected

Academic year: 2024

Membagikan "Matrices"

Copied!
22
0
0

Teks penuh

(1)

14

Matrices

In this chapter, we discuss basic definitions and results concerning matrices. We shall start out with a very general point of view, discussing matrices whose entries lie in an arbitrary ringR. Then we shall specialize to the case where the entries lie in a fieldF, where much more can be said.

One of the main goals of this chapter is to discuss “Gaussian elimination,” which is an algorithm that allows us to efficiently compute bases for the image and kernel of anF-linear map.

In discussing the complexity of algorithms for matrices over a ringR, we shall treat a ringRas an “abstract data type,” so that the running times of algorithms will be stated in terms of the number of arithmetic operations inR. IfRis a finite ring, such asZm, we can immediately translate this into a running time on a RAM (in later chapters, we will discuss other finite rings and efficient algorithms for doing arithmetic in them).

If R is, say, the field of rational numbers, a complete running time analysis would require an additional analysis of the sizes of the numbers that appear in the execution of the algorithm. We shall not attempt such an analysis here — however, we note that all the algorithms discussed in this chapter do in fact run in poly- nomial time whenR=Q, assuming we represent rational numbers as fractions in lowest terms. Another possible approach for dealing with rational numbers is to use floating point approximations. While this approach eliminates the size problem, it creates many new problems because of round-offerrors. We shall not address any of these issues here.

14.1 Basic definitions and properties Throughout this section,Rdenotes a ring.

For positive integersmandn, anm×nmatrixAover a ringRis a rectangular

377

(2)

array

A=

a11 a12 · · · a1n a21 a22 · · · a2n ... ... ... am1 am2 · · · amn

 ,

where each entryaij in the array is an element ofR; the elementaij is called the (i,j)entryofA, which we denote byA(i,j). Fori=1, . . . ,m, theith rowofAis

(ai1, . . . ,ain),

which we denote by Rowi(A), and forj =1, . . . ,n, thejth columnofAis

a1j a2j ... amj

 ,

which we denote by Colj(A). We regard a row ofAas a 1×nmatrix, and a column ofAas anm×1 matrix.

The set of all m×n matrices over R is denoted byRm×n. Elements of Rn are called row vectors (of dimension n) and elements of Rm×1 are called col- umn vectors (of dimensionm). Elements ofRn×nare calledsquare matrices (of dimensionn). We do not make a distinction betweenRn andR×n; that is, we view standardn-tuples as row vectors.

We can define the familiar operations ofmatrix additionandscalar multipli- cation:

• If A,BRm×n, then A+ B is the m× n matrix whose (i,j) entry is A(i,j)+B(i,j).

• IfcR andARm×n, thencAis them×nmatrix whose (i,j) entry is cA(i,j).

Them×nzero matrixis them×nmatrix, all of whose entries are 0R; we denote this matrix by 0m×nR (or just 0, when the context is clear).

Theorem 14.1. With addition and scalar multiplication as defined above,Rm×nis an R-module. The matrix 0m×nR is the additive identity, and the additive inverse of a matrixARm×nis them×nmatrix whose (i,j)entry is−A(i,j).

Proof. To prove this, one first verifies that matrix addition is associative and com- mutative, which follows from the associativity and commutativity of addition inR.

The claims made about the additive identity and additive inverses are also easily

(3)

verified. These observations establish thatRm×nis an abelian group. One also has to check that all of the properties in Definition 13.1 hold. We leave this to the reader. 2

We can also define the familiar operation ofmatrix multiplication:

• IfARm×nandBRn×p, thenABis them×pmatrix whose (i,k) entry is

n

X

j=1

A(i,j)B(j,k).

The n × n identity matrix is the matrix IRn×n, where I(i,i) := 1R and I(i,j) := 0R for i 6= j. That is, I has 1R’s on the diagonal that runs from the upper left corner to the lower right corner, and 0R’s everywhere else.

Theorem 14.2.

(i) Matrix multiplication is associative; that is, A(BC) = (AB)C for all ARm×n,BRn×p, andCRp×q.

(ii) Matrix multiplication distributes over matrix addition; that is,A(C+D) = AC+ADand(A+B)C =AC+BCfor allA,BRm×nandC,DRn×p. (iii) The n×nidentity matrix IRn×nacts as a multiplicative identity; that is, AI = Aand IB = B for all ARm×nand BRn×m; in particular, CI =C=IC for allCRn×n.

(iv) Scalar multiplication and matrix multiplication associate; that is,c(AB) = (cA)B =A(cB)for all cR,ARm×n, andBRn×p.

Proof. All of these are trivial, except for (i), which requires just a bit of compu- tation to show that the (i,`) entry of bothA(BC) and (AB)C is equal to (as the reader may verify)

X

1≤jn 1≤kp

A(i,j)B(j,k)C(k,`). 2

Note that while matrix addition is commutative, matrix multiplication in general is not. Indeed, Theorems 14.1 and 14.2 imply thatRn×nsatisfies all the properties of a ring except for commutativity of multiplication.

Some simple but useful facts to keep in mind are the following:

• If ARm×n and BRn×p, then the ith row of AB is equal to vB, where v = Rowi(A); also, thekth column of AB is equal toAw, where w =Colk(B).

(4)

• IfARm×nandv =(c1, . . . ,cm) ∈Rm, then vA=

m

X

i=1

ciRowi(A).

In words: vAis a linear combination of the rows of A, with coefficients taken from the corresponding entries ofv.

• IfARm×nand

w =

d1

... dn

∈Rn×1, then

Aw=

n

X

j=1

djColj(A).

In words:Awis a linear combination of the columns ofA, with coefficients taken from the corresponding entries ofw.

If ARm×n, thetranspose of A, denoted by A , is defined to be the n×m matrix whose (j,i) entry isA(i,j).

Theorem 14.3. If A,BRm×n,CRn×p, and cR, then:

(i) (A+B) =A +B ; (ii) (cA) =cA ;

(iii) (A ) =A;

(iv) (AC) =C A . Proof. Exercise. 2

If Ai is an ni×ni+1 matrix, for i = 1, . . . ,k, then by associativity of matrix multiplication, we may write the product matrixA1· · ·Ak, which is ann1×nk+1

matrix, without any ambiguity.

For ann×nmatrixA, and a positive integerk, we writeAkto denote the product A· · ·A, where there arekterms in the product. Note thatA1 =A. We may extend this notation tok=0, definingA0to be then×nidentity matrix. One may readily verify the usual rules of exponent arithmetic: for all non-negative integersk,`, we have

(A`)k =Ak`=(Ak)` and AkA`=Ak+`.

It is easy also to see that part (iv) of Theorem 14.3 implies that for all non-negative

(5)

integersk, we have

(Ak) =(A )k.

Algorithmic issues

For computational purposes, matrices are represented in the obvious way as arrays of elements ofR. As remarked at the beginning of this chapter, we shall treatRas an “abstract data type,” and not worry about how elements ofRare actually rep- resented; in discussing the complexity of algorithms, we shall simply count “oper- ations inR,” by which we mean additions, subtractions, and multiplications; we shall sometimes also include equality testing and computing multiplicative inverses as “operations inR.” In any real implementation, there will be other costs, such as incrementing counters, and so on, which we may safely ignore, as long as their number is at most proportional to the number of operations inR.

The following statements are easy to verify:

• We can multiply anm×nmatrix by a scalar usingmnoperations inR.

• We can add twom×nmatrices usingmnoperations inR.

• We can compute the product of anm×nmatrix and ann×pmatrix using O(mnp) operations inR.

It is also easy to see that given ann×nmatrixA, and a non-negative integere, we can adapt the repeated squaring algorithm discussed in §3.4 so as to computeAe usingO(len(e)) multiplications ofn×nmatrices, and henceO(len(e)n3) operations inR.

EXERCISE 14.1. LetARm×n. Show that ifvA = 0nR for allvRm, then A=0m×nR .

14.2 Matrices and linear maps LetRbe a ring.

For positive integersmandn, consider theR-modulesRm andRn. IfAis anm×nmatrix overR, then the map

λA: RmRn v7→vA

is easily seen to be anR-linear map — this follows immediately from parts (ii) and (iv) of Theorem 14.2. We callλAthelinear map corresponding toA.

(6)

Ifv =(c1, . . . ,cm)∈Rm, then λA(v)=vA=

m

X

i=1

ciRowi(A).

From this, it is clear that

• the image ofλA is the submodule ofRn spanned by {Rowi(A)}mi=1; in particular,λAis surjective if and only if{Rowi(A)}mi=1spansRn;

λAis injective if and only if{Rowi(A)}mi=1is linearly independent.

There is a close connection between matrix multiplication and composition of corresponding linear maps. Specifically, letARm×nandBRn×p, and consider the corresponding linear mapsλA :RmRnandλB : RnRp. Then we have

λBλA=λAB. (14.1)

This follows immediately from the associativity of matrix multiplication.

We have seen how vector/matrix multiplication defines a linear map. Conversely, we shall now see that the action of any R-linear map can be viewed as a vec- tor/matrix multiplication, provided theR-modules involved have bases (which will always be the case for finite dimensional vector spaces).

LetM be anR-module, and suppose thatS = {αi}mi=1 is a basis forM, where m > 0. As we know (see Theorem 13.14), every elementαM can be written uniquely asc1α1+· · ·+cmαm, where theci’s are inR. Let us define

VecS(α) :=(c1, . . . ,cm)∈Rm.

We call VecS(α) thecoordinate vector ofαrelative toS. The function VecS :MRm

is an R-module isomorphism (it is the inverse of the isomorphism ε in Theo- rem 13.14).

Let N be another R-module, and suppose that T = {βj}nj=1 is a basis for N, wheren >0. Just as in the previous paragraph, every elementβNhas a unique coordinate vector VecT(β)∈Rnrelative toT.

Now letρ:MNbe an arbitraryR-linear map. Our goal is to define a matrix ARm×nwith the following property:

VecT(ρ(α))=VecS(α)A for allαM. (14.2) In words: if we multiply the coordinate vector ofαon the right byA, we get the coordinate vector ofρ(α).

(7)

Constructing such a matrix Ais easy: we define Ato be the matrix whoseith row, fori=1, . . . ,m, is the coordinate vector ofρ(αi) relative toT. That is,

Rowi(A)=VecT(ρ(αi)) fori=1, . . . ,m.

Then for an arbitraryαM, if (c1, . . . ,cm) is the coordinate vector ofαrelative to S, we have

ρ(α) =ρX

i

ciαi

=X

i

ciρ(αi) and so

VecT(ρ(α))=X

i

ciVecT(ρ(αi))=X

i

ciRowi(A) =VecS(α)A.

Furthermore,Ais the only matrix satisfying (14.2). Indeed, ifA0 also satisfies (14.2), then subtracting, we obtain

VecS(α)(AA0)=0nR

for allαM. Since the map VecS : MRm is surjective, this means that v(AA0) is zero for allvRm, and from this, it is clear (see Exercise 14.1) that AA0is the zero matrix, and soA=A0.

We call the unique matrixAsatisfying (14.2) thematrix ofρrelative toS and T, and denote it by MatS,T(ρ).

Recall that HomR(M,N) is theR-module consisting of allR-linear maps from M toN (see Theorem 13.12). We can view MatS,T as a function mapping ele- ments of HomR(M,N) to elements ofRm×n.

Theorem 14.4. The function MatS,T : HomR(M,N) → Rm×n is an R-module isomorphism. In particular, for everyARm×n, the pre-image of AunderMatS,T

isVec−1TλA◦VecS, where λA:RmRnis the linear map corresponding to A.

Proof. To show that MatS,T is anR-linear map, letρ,ρ0 ∈HomR(M,N), and let cR. Also, letA:= MatS,T(ρ) andA0 := MatS,T(ρ0). Then for allαM, we have

VecT((ρ+ρ0)(α))=VecT(ρ(α)+ρ0(α))=VecT(ρ(α))+VecT(ρ0(α))

=VecS(α)A+VecS(α)A0 =VecS(α)(A+A0).

As this holds for all αM, and since the matrix of a linear map is uniquely determined, we must have MatS,T(ρ+ρ0) = A+A0. A similar argument shows that MatS,T()=cA. This shows that MatS,T is anR-linear map.

To show that the map MatS,T is injective, it suffices to show that its kernel is

(8)

trivial. Ifρis in the kernel of this map, then settingA := 0m×nR in (14.2), we see that VecT(ρ(α)) is zero for allαM. But since the map VecT : NRn is injective, this impliesρ(α) is zero for allαM. Thus,ρmust be the zero map.

To show surjectivity, we show that every ARm×n has a pre-image under MatS,T as described in the statement of the theorem. So letAbe anm×nmatrix, and letρ:=Vec−1TλA◦VecS. Again, since the matrix of a linear map is uniquely determined, it suffices to show that (14.2) holds for this particularAandρ. For everyαM, we have

VecT(ρ(α))=VecT(Vec−1T (λA(VecS(α))))=λA(VecS(α))

=VecS(α)A.

That proves the theorem. 2

As a special case of the above, suppose thatM =Rm andN = Rn, andS andT are the standard bases forM andN (see Example 13.27). In this case, the functions VecS and VecT are the identity maps, and the previous theorem implies that the function

Λ: Rm×n→HomR(Rm,Rn) A7→λA

is the inverse of the function MatS,T : HomR(Rm,Rn) → Rm×n. Thus, the functionΛis also anR-module isomorphism.

To summarize, we see that an R-linear map ρ from M to N, together with particular bases forM andN, uniquely determine a matrixAsuch that the action of multiplication on the right byAimplements the action ofρwith respect to the given bases. There may be many bases forM andNto choose from, and different choices will in general lead to different matrices. Also, note that in general, a basis may be indexed by an arbitrary finite set; however, in defining coordinate vectors and matrices of linear maps, the index set must be ordered in some way. In any case, from a computational perspective, the matrixAgives us an efficient way to compute the mapρ, assuming elements ofM andN are represented as coordinate vectors with respect to the given bases.

We have taken a “row-centric” point of view. Of course, if one prefers, by simply transposing everything, one can equally well take a “column-centric” point of view, where the action ofρcorresponds to multiplication of a column vector on the left by a matrix.

Example 14.1. Consider the quotient ringE =R[X]/(f), wherefR[X] with deg(f) = ` > 0 and lc(f) ∈ R. Let ξ := [X]fE. As an R-module, E has a basis S := {ξi−1}`i=1 (see Example 13.30). Let ρ : EE be the ξ- multiplication map, which sends αE toξαE. This is anR-linear map. If

(9)

f =c0+c1X+· · ·+c`−1X`−1+c`X`, then the matrix ofρrelative toS is the`×` matrix

A=

0 1 0 · · · 0

0 0 1 · · · 0

. ..

0 0 0 · · · 1

c0/c`c1/c`c2/c` · · · −c`−1/c`

 ,

where fori = 1, . . . ,`−1, theith row ofAcontains a 1 in positioni+1, and is zero everywhere else. The matrixAis called thecompanion matrix off. 2 Example 14.2. Letx1, . . . ,xkR. LetR[X]<kbe the set of polynomialsgR[X] with deg(g) < k, which is anR-module with a basisS := {Xi−1}ki=1 (see Exam- ple 13.29). The multi-point evaluation map

ρ: R[X]<kRk

g7→(g(x1), . . . ,g(xk))

is anR-linear map. LetT be the standard basis forRk. Then the matrix of ρ relative toS andT is thek×kmatrix

A=

1 1 · · · 1 x1 x2 · · · xk x21 x22 · · · x2k ... ... ... xk−11 xk−12 · · · xk−1k

 .

The matrixAis called aVandermonde matrix. 2

EXERCISE 14.2. Letσ :MNandτ :NP beR-linear maps, and suppose thatM,N, andP have basesS,T, andU, respectively. Show that

MatS,U(τσ)=MatS,T(σ)·MatT,U(τ).

EXERCISE 14.3. Let V be a vector space over a fieldF with basisS = {αi}mi=1. Suppose that U is a subspace ofV of dimension` < m. Show that there exists a matrixAFm×(m`) such that for all αV, we have αU if and only if VecS(α)Ais zero. Such a matrixAis called aparity check matrixforU.

EXERCISE14.4. LetF be a finite field, and letAbe a non-zerom×nmatrix over F. Suppose one chooses a vectorvFm at random. Show that the probability thatvAis the zero vector is at most 1/|F|.

(10)

EXERCISE 14.5. Design and analyze a probabilistic algorithm that takes as input matrices A,B,C ∈ Zm×mp , where pis a prime. The algorithm should run in time O(m2len(p)2) and should output either “yes” or “no” so that the following holds:

• ifC =AB, then the algorithm should always output “yes”;

• ifC 6=AB, then the algorithm should output “no” with probability at least 0.999.

14.3 The inverse of a matrix LetRbe a ring.

For a square matrixARn×n, we call a matrix BRn×n an inverseofAif BA=AB =I, whereI is then×nidentity matrix. It is easy to see that ifAhas an inverse, then the inverse is unique: ifBandCare inverses ofA, then

B=BI =B(AC)=(BA)C =IC =C.

Because the inverse ofAis uniquely determined, we denote it byA−1. IfAhas an inverse, we say thatAisinvertible, or non-singular. IfAis not invertible, it is sometimes calledsingular. We will use the terms “invertible” and “not invertible.”

Observe thatAis the inverse ofA−1; that is, (A−1)−1=A.

IfAandB are invertiblen×nmatrices, then so is their product: in fact, it is easily verified that (AB)−1 =B−1A−1. It follows that ifAis an invertible matrix, andkis a non-negative integer, then Ak is invertible with inverse (A−1)k, which we also denote byAk.

It is also easy to see thatAis invertible if and only if the transposed matrixA is invertible, in which case (A )−1 =(A−1) . Indeed,AB =I = BAholds if and only ifB A =I=A B .

We now develop a connection between invertible matrices and R-module iso- morphisms. Recall from the previous section theR-module isomorphism

Λ: Rn×n→HomR(Rn,Rn) A7→λA,

where for eachARn×n,λAis the correspondingR-linear map λA: RnRn

v 7→vA.

Evidently,λIis the identity map.

Theorem 14.5. Let ARn×n, and let λA :RnRn be the corresponding R-linear map. Then A is invertible if and only if λA is bijective, in which case λA−1 =λ−1A .

(11)

Proof. SupposeAis invertible, and thatBis its inverse. We haveAB= BA= I, and henceλAB = λBA = λI, from which it follows (see (14.1)) thatλBλA = λAλB =λI. SinceλI is the identity map, this impliesλAis bijective.

Suppose λA is bijective. We know that the inverse mapλ−1A is also anR-linear map, and since the mapping Λabove is surjective, we have λ−1A = λB for some BRn×n. Therefore, we have λBλA = λAλB = λI, and hence (again, see (14.1)) λAB = λBA = λI. Since the mapping Λis injective, it follows that AB=BA=I. This impliesAis invertible, withA−1=B. 2

We also have:

Theorem 14.6. LetARn×n. The following are equivalent:

(i) Ais invertible;

(ii) {Rowi(A)}ni=1is a basis for Rn; (iii) {Colj(A)}nj=1is a basis for Rn×1.

Proof. We first prove the equivalence of (i) and (ii). By the previous theorem, A is invertible if and only if λA is bijective. Also, in the previous section, we observed thatλAis surjective if and only if{Rowi(A)}ni=1spansRn, and thatλA

is injective if and only if{Rowi(A)}ni=1is linearly independent.

The equivalence of (i) and (iii) follows by considering the transpose ofA. 2 EXERCISE 14.6. LetRbe a ring, and letAbe a square matrix overR. Let us call Baleft inverseofAifBA=I, and let us callCaright inverseofAifAC =I. (a) Show that ifAhas both a left inverseBand a right inverseC, thenB =C

and henceAis invertible.

(b) Assume thatRis a field. Show that ifAhas either a left inverse or a right inverse, thenAis invertible.

Note that part (b) of the previous exercise holds for arbitrary rings, but the proof of this is non-trivial, and requires the development of the theory of determinants, which we do not cover in this text.

EXERCISE 14.7. Show that ifAandB are two square matrices over a field such that their productABis invertible, then bothAandBthemselves must be invert- ible.

EXERCISE 14.8. Show that ifAis a square matrix over an arbitrary ring, andAk is invertible for somek >0, thenAis invertible.

EXERCISE 14.9. With notation as in Example 14.1, show that the matrixA is invertible if and only ifc0R.

(12)

EXERCISE 14.10. With notation as in Example 14.2, show that the matrixAis invertible if and only ifxixjRfor alli6=j.

14.4 Gaussian elimination Throughout this section,F denotes a field.

A matrixBFm×nis said to be inreduced row echelon formif there exists a sequence of integers (p1, . . . ,pr), with 0≤rmand 1≤p1< p2<· · ·< prn, such that the following holds:

• fori=1, . . . ,r, all of the entries in rowiofBto the left of entry (i,pi) are zero; that is,B(i,j)=0F forj =1, . . . ,pi−1;

• fori = 1, . . . ,r, all of the entries in columnpiofB above entry (i,pi) are zero; that is,B(i0,pi) =0F fori0=1, . . . ,i−1;

• fori=1, . . . ,r, we haveB(i,pi)=1F;

• all entries in rows r+ 1, . . . ,m of B are zero; that is, B(i,j) = 0F for i=r+1, . . . ,mandj=1, . . . ,n.

It is easy to see that if B is in reduced row echelon form, then the sequence (p1, . . . ,pr) above is uniquely determined, and we call it thepivot sequenceofB.

Several further remarks are in order:

• All of the entries ofB are completely determined by the pivot sequence, except for the entries (i,j) with 1≤irandj > piwithj /∈ {pi+1, . . . ,pr}, which may be arbitrary.

• IfBis ann×nmatrix in reduced row echelon form whose pivot sequence is of lengthn, thenB must be then×nidentity matrix.

• We allow for an empty pivot sequence (i.e.,r= 0), which will be the case precisely whenB=0m×nF .

Example 14.3. The following 4×6 matrixBover the rational numbers is in reduced row echelon form:

B =

0 1 −2 0 0 3

0 0 0 1 0 2

0 0 0 0 1 −4

0 0 0 0 0 0

 .

The pivot sequence ofB is (2, 4, 5). Notice that the first three rows ofB form a linearly independent family of vectors, that columns 2, 4, and 5 form a linearly independent family of vectors, and that all of other columns ofB are linear com- binations of columns 2, 4, and 5. Indeed, if we truncate the pivot columns to their first three rows, we get the 3×3 identity matrix. 2

(13)

Generalizing the previous example, if a matrix is in reduced row echelon form, it is easy to deduce the following properties, which turn out to be quite useful:

Theorem 14.7. If B is a matrix in reduced row echelon form with pivot sequence (p1, . . . ,pr), then:

(i) rows 1, 2, . . . ,rof Bform a linearly independent family of vectors;

(ii) columnsp1, . . . ,prof Bform a linearly independent family of vectors, and all other columns of Bcan be expressed as linear combinations of columns p1, . . . ,pr.

Proof. Exercise — just look at the matrix! 2

Gaussian eliminationis an algorithm that transforms a given matrixAFm×n into a matrixBFm×n, whereBis in reduced row echelon form, and is obtained fromAby a sequence of elementary row operations. There are three types of elementary row operations:

Type I: swap two rows;

Type II: multiply a row by a non-zero scalar;

Type III: add a scalar multiple of one row to a different row.

The application of any specific elementary row operation to an m×n matrix C can be affected by multiplying C on the left by a suitable m×m matrix X.

Indeed, the matrix X corresponding to a particular elementary row operation is simply the matrix obtained by applying the same elementary row operation to the m×midentity matrix. It is easy to see that for every elementary row operation, the corresponding matrixXis invertible.

We now describe the basic version of Gaussian elimination. The input is anm×n matrixA, and the algorithm is described in Fig. 14.1.

The algorithm works as follows. First, it makes a copyBofA(this is not neces- sary if the original matrixAis not needed afterwards). The algorithm proceeds col- umn by column, starting with the left-most column, so that after processing column j, the firstj columns ofBare in reduced row echelon form, and the current value ofrrepresents the length of the pivot sequence. To process columnj, in steps 3–6 the algorithm first searches for a non-zero element amongB(r+1,j), . . . ,B(m,j);

if none is found, then the first j +1 columns of B are already in reduced row echelon form. Otherwise, one of these non-zero elements is selected as thepivot element(the choice is arbitrary), which is then used in steps 8–13 to bring column j into the required form. After incrementingr, the pivot element is brought into position (r,j), using a Type I operation in step 9. Then the entry (r,j) is set to 1F, using a Type II operation in step 10. Finally, all the entries above and below entry (r,j) are set to 0F, using Type III operations in steps 11–13. Note that because

(14)

1. BA,r←0 2. forj ←1 tondo 3. `←0,ir

4. while`=0 andimdo

5. ii+1

6. ifB(i,j)6=0F then`i 7. if`6=0 then

8. rr+1

9. swap rowsrand`ofB

10. Rowr(B) ←B(r,j)−1Rowr(B)

11. fori←1 tomdo

12. ifi6=rthen

13. Rowi(B)←Rowi(B)−B(i,j) Rowr(B) 14. outputB

Fig. 14.1. Gaussian elimination

columns 1, . . . ,j−1 ofBwere already in reduced row echelon form, none of these operations changes any values in these columns.

As for the complexity of the algorithm, it is easy to see that it performsO(mn) elementary row operations, each of which takesO(n) operations inF, so a total of O(mn2) operations inF.

Example 14.4. Consider the execution of the Gaussian elimination algorithm on input

A=

[0] [1] [1]

[2] [1] [2]

[2] [2] [0]

∈Z3×33 . After copyingAintoB, the algorithm transformsBas follows:

[0] [1] [1]

[2] [1] [2]

[2] [2] [0]

Row1↔Row2

−−−−−−−−−→

[2] [1] [2]

[0] [1] [1]

[2] [2] [0]

Row1←[2] Row1

−−−−−−−−−−−→

[1] [2] [1]

[0] [1] [1]

[2] [2] [0]

Row3←Row3−[2] Row1

−−−−−−−−−−−−−−−−→

[1] [2] [1]

[0] [1] [1]

[0] [1] [1]

Row1←Row1−[2] Row2

−−−−−−−−−−−−−−−−→

[1] [0] [2]

[0] [1] [1]

[0] [1] [1]

(15)

Row3←Row3Row2

−−−−−−−−−−−−−−→

[1] [0] [2]

[0] [1] [1]

[0] [0] [0]

 . 2

Suppose the Gaussian elimination algorithm performs a total of t elementary row operations. Then as discussed above, the application of theeth elementary row operation, fore = 1, . . . ,t, amounts to multiplying the current value of the matrixBon the left by a particular invertiblem×mmatrixXe. Therefore, the final output value ofBsatisfies the equation

B =XA where X =XtXt−1· · ·X1.

Since the product of invertible matrices is also invertible, we see that X itself is invertible.

Although the algorithm as presented does not compute the matrix X, it can be easily modified to do so. The resulting algorithm, which we call extended Gaussian elimination, is the same as plain Gaussian elimination, except that we initialize the matrixX to be them×midentity matrix, and we add the following steps:

• just before step 9: swap rowsrand`ofX;

• just before step 10: Rowr(X)←B(r,j)−1Rowr(X);

• just before step 13: Rowi(X) ←Rowi(X)−B(i,j) Rowr(X).

At the end of the algorithm we outputXin addition toB.

So we simply perform the same elementary row operations onXthat we perform onB. The reader may verify that the above algorithm is correct, and that it uses O(mn(m+n)) operations inF.

Example 14.5. Continuing with Example 14.4, the execution of the extended Gaussian elimination algorithm initializesXto the identity matrix, and then trans- formsXas follows:

[1] [0] [0]

[0] [1] [0]

[0] [0] [1]

Row1↔Row2

−−−−−−−−−→

[0] [1] [0]

[1] [0] [0]

[0] [0] [1]

Row1←[2] Row1

−−−−−−−−−−−→

[0] [2] [0]

[1] [0] [0]

[0] [0] [1]

Row3←Row3−[2] Row1

−−−−−−−−−−−−−−−−→

[0] [2] [0]

[1] [0] [0]

[0] [2] [1]

Row1←Row1−[2] Row2

−−−−−−−−−−−−−−−−→

[1] [2] [0]

[1] [0] [0]

[0] [2] [1]

(16)

Row3←Row3Row2

−−−−−−−−−−−−−−→

[1] [2] [0]

[1] [0] [0]

[2] [2] [1]

 . 2

EXERCISE14.11. For each type of elementary row operation, describe the matrix Xwhich corresponds to it, as well asX−1.

EXERCISE 14.12. Given a matrixBFm×nin reduced row echelon form, show how to compute its pivot sequence usingO(n) operations inF.

EXERCISE 14.13. In §4.4, we saw how to speed up matrix multiplication overZ using the Chinese remainder theorem. In this exercise, you are to do the same, but for performing Gaussian elimination overZp, wherepis a large prime. Suppose you are given anm×mmatrixAoverZp, where len(p) = Θ(m). Straightforward application of Gaussian elimination would requireO(m3) operations inZp, each of which takes timeO(m2), leading to a total running time ofO(m5). Show how to use the techniques of §4.4 to reduce the running time of Gaussian elimination to O(m4).

14.5 Applications of Gaussian elimination

Throughout this section,Ais an arbitrarym×nmatrix over a fieldF, andXA=B, whereXis an invertiblem×mmatrix, andB is anm×nmatrix in reduced row echelon form with pivot sequence (p1, . . . ,pr). This is precisely the information produced by the extended Gaussian elimination algorithm, givenAas input (the pivot sequence can easily be “read” directly fromB— see Exercise 14.12). Also, let

λA: FmFn v7→vA be the linear map corresponding toA.

Computing the image and kernel

Consider first therow spaceofA, by which we mean the subspace ofFnspanned by{Rowi(A)}mi=1, or equivalently, the image ofλA.

We claim that the row space ofAis the same as the row space ofB. To see this, note that sinceB =XA, for everyvFm, we havevB =v(XA)=(vX)A, and so the row space ofBis contained in the row space ofA. For the other containment, note that since X is invertible, we can write A = X−1B, and apply the same argument.

(17)

Further, note that the row space ofB, and hence that ofA, clearly has dimension r. Indeed, as stated in Theorem 14.7, rows 1, . . . ,rofB form a basis for the row space ofB.

Consider next the kernel K of λA, or what we might call therow null space of A. We claim that {Rowi(X)}mi=r+1 is a basis for K. Clearly, just from the fact that XA = B and the fact that rows r +1, . . . ,mof B are zero, it follows that rowsr+1, . . . ,mofX are contained inK. Furthermore, asX is invertible, {Rowi(X)}mi=1is a basis forFm(see Theorem 14.6). Thus, the family of vectors {Rowi(X)}mi=r+1is linearly independent and spans a subspaceK0ofK. It suffices to show thatK0 = K. Suppose to the contrary thatK0 (K, and letvK \K0. As{Rowi(X)}mi=1spansFm, we may writev =Pm

i=1ciRowi(X); moreover, as v /K0, we must haveci 6=0F for somei=1, . . . ,r. Setting ˜v := (c1, . . . ,cm), we see thatv=vX, and so˜

λA(v)=vA=( ˜vX)A=v˜(XA)=vB

Furthermore, since{Rowi(B)}ri=1 is linearly independent, rowsr+1, . . . ,mofB are zero, and ˜vhas a non-zero entry in one of its firstrpositions, we see that ˜vBis not the zero vector. We have derived a contradiction, and hence may conclude that K0=K.

Finally, note that ifm = n, thenAis invertible if and only if its row space has dimension m, which holds if and only if r = m, and in the latter case, B is the identity matrix, and henceXis the inverse ofA.

Let us summarize the above discussion:

• The firstrrows ofBform a basis for the row space ofA(i.e., the image of λA).

• The lastmrrows ofXform a basis for the row null space ofA(i.e., the kernel ofλA).

• Ifm = n, thenAis invertible (i.e., λA is an isomorphism) if and only if r =m, in which caseXis the inverse ofA(i.e., the matrix ofλ−1A relative to the standard basis).

So we see that from the output of the extended Gaussian elimination algorithm, we can simply “read off” bases for both the image and the kernel, as well as the inverse (if it exists), of a linear map represented as a matrix with respect to given bases. Also note that this procedure provides a “constructive” version of Theo- rem 13.28.

Example 14.6. Continuing with Examples 14.4 and 14.5, we see that the vectors ([1], [0], [2]) and ([0], [1], [1]) form a basis for the row space ofA, while the vector ([2], [2], [1]) is a basis for the row null space ofA. 2

(18)

Solving systems of linear equations

Suppose that in addition to the matrixA, we are givenwFn, and want to find a solutionvFm (or perhaps describe all solutions), to the equation

vA=w. (14.3)

Equivalently, we can phrase the problem as finding an element (or describing all elements) of the setλ−1A ({w}).

Now, if there exists a solution at all, sayvFm, thenλA(v) = λA(v0) if and only ifvv0 (modK), whereK is the kernel ofλA. It follows that the set of all solutions to (14.3) isv+K = {v+v0 :v0K}. Thus, given a basis forK and any solutionv to (14.3), we have a complete and concise description of the set of solutions to (14.3).

As we have discussed above, the lastmrrows ofX form a basis forK, so it suffices to determine ifw∈ImλA, and if so, determine a single pre-imagev ofw.

Also as we discussed, ImλA, that is, the row space ofA, is equal to the row space ofB, and because of the special form ofB, we can quickly and easily determine if the givenwis in the row space ofB, as follows. By definition,wis in the row space ofBif and only if there exists a vectorvFmsuch thatvB =w. We may as well assume that all but the firstrentries ofv are zero. Moreover,vB =wimplies that fori=1, . . . ,r, theith entry ofvis equal to thepith entry ofw. Thus, the vectorv, if it exists, is completely determined by the entries ofwat positionsp1, . . . ,pr. We can constructvsatisfying these conditions, and then test ifvB =w. If not, then we may conclude that (14.3) has no solutions; otherwise, settingv :=vX, we see that vA=(vX)A=v(XA)=vB =w, and sovis a solution to (14.3).

One easily verifies that if we implement the above procedure as an algorithm, the work done in addition to running the extended Gaussian elimination algorithm amounts toO(m(n+m)) operations inF.

A special case of the above procedure is when m = n andAis invertible, in which case (14.3) has a unique solution, namely, v := wX, since in this case, X=A−1.

The rank of a matrix

We define therow rankofAto be the dimension of its row space, which is equal to dimF(ImλA). Thecolumn spaceofAis defined as the subspace ofFm×1spanned by{Colj(A)}nj=1; that is, the column space ofAis{Az:zFn×1}. Thecolumn rankofAis the dimension of its column space.

Now, the column space ofAneed not be the same as the column space ofB, but from the identityB = XA, and the fact thatXis invertible, it easily follows that these two subspaces are isomorphic (via the map that sendsyFm×1toXy), and

(19)

hence have the same dimension. Moreover, by Theorem 14.7, the column rank of Bisr, which is the same as the row rank ofA.

So we may conclude: The column rank and row rank of Aare the same.

Because of this, we may define therankof a matrix to be the common value of its row and column rank.

The orthogonal complement of a subspace

So as to give equal treatment to rows and columns, one can also define thecolumn null spaceofAto be the kernel of the linear map defined by multiplication on the left byA; that is, the column null space of Ais{zFn×1 : Az = 0m×1F }. By applying the results above to the transpose ofA, we see that the column null space ofAhas dimensionnr, whereris the rank ofA.

Let UFn be the row space of A, and let UFn denote the set of all vectorsuFn whose transposeu belongs to the column null space ofA.

Now, U is a subspace of Fn of dimensionr andU is a subspace of Fn of dimensionnr. The spaceUconsists precisely of all vectorsuFnthat are

“orthogonal” to all vectorsuU, in the sense that the “inner product”uu is zero.

For this reason,Uis sometimes called the “orthogonal complement ofU.”

Clearly,Uis determined by the subspaceUitself, and does not depend on the particular choice of matrixA. It is also easy to see that the orthogonal complement of U is U; that is, (U) = U. This follows immediately from the fact that U ⊆(U)and dimF((U)) =n−dimF(U)=dimF(U).

Now suppose thatUU={0}. Then by Theorem 13.11, we have an isomor- phism ofU×UwithU+U, and sinceU×Uhas dimensionn, it must be the case thatU+U =Fn. It follows that every element ofFncan be expressed uniquely asu+u, whereuUanduU.

We emphasize that the observations in the previous paragraph hinged on the assumption thatUU = {0}, which itself holds providedU contains no non- zero “self-orthogonal vectors” u such that uu is zero. If F is the field of real numbers, then of course there are no non-zero self-orthogonal vectors, sinceuu is the sum of the squares of the entries ofu. However, for other fields, there may very well be non-zero self-orthogonal vectors. As an example, ifF =Z2, then any vectoruwith an even number of 1-entries is self orthogonal.

So we see that while much of the theory of vector spaces and matrices carries over without change from familiar ground fields, like the real numbers, to arbitrary ground fieldsF, not everything does. In particular, the usual decomposition of a vector space into a subspace and its orthogonal complement breaks down, as does any other procedure that relies on properties specific to “inner product spaces.”

(20)

For the following three exercises, as above,Ais an arbitrarym×nmatrix over a fieldF, andXA=B, whereXis an invertiblem×mmatrix, andBis in reduced row echelon form.

EXERCISE14.14. Show that the column null space ofAis the same as the column null space ofB.

EXERCISE 14.15. Show how to compute a basis for the column null space ofA usingO(r(nr)) operations inF, givenAandB.

EXERCISE 14.16. Show that the matrix B is uniquely determined by A; more precisely, show that ifX0A=B0, whereX0is an invertiblem×mmatrix, andB0is in reduced row echelon form, thenB0=B.

In the following two exercises, the theory of determinants could be used; how- ever, they can all be solved directly, without too much difficulty, using just the ideas developed so far in the text.

EXERCISE14.17. Letpbe a prime. A matrixA∈Zm×mis calledinvertible mod- ulopif there exists a matrixB ∈Zm×msuch thatABBAI(modp), whereI is them×minteger identity matrix. Here, two matrices are considered congruent with respect to a given modulus if their corresponding entries are congruent. Show thatAis invertible modulopif and only ifAis invertible overQ, and the entries ofA−1lie inQ(p)(see Example 7.26).

EXERCISE 14.18. You are given a matrixA ∈ Zm×m and a primep such thatA is invertible modulo p (see previous exercise). Suppose that you are also given w∈Zm.

(a) Show how to efficiently compute a vector v ∈ Zm such that vAw (modp), and thatvis uniquely determined modulop.

(b) Given a vector v as in part (a), along with an integer e ≥ 1, show how to efficiently compute ˆv ∈ Zm such that ˆvAw (mod pe), and that ˆv is uniquely determined modulo pe. Hint: mimic the “lifting” procedure discussed in §12.5.2.

(c) Using parts (a) and (b), design and analyze an efficient algorithm that takes the matrix A and the primep as input, together with a bound H on the absolute value of the numerator and denominator of the entries of the vector v0 that is the unique (rational) solution to the equation v0A = w. Your algorithm should run in time polynomial in the length ofH, the length of p, and the sum of the lengths of the entries ofAandw. Hint: use rational reconstruction, but be sure to fully justify its application.

Gambar

Fig. 14.1. Gaussian elimination

Referensi

Dokumen terkait

The analysis of avoidance of a single pattern of dimension 1 × n or m × 1 can easily be reduced to that of avoidance of the pattern in each row (column) of the matrix

A packet contains fields if and only if the corresponding wild partition is isotypical; in this case, the fields in the packets have total mass 1/m and the non-fields total mass 1

(Indeed, if the residue class field k has characteristic 0, and hence K ′ /K can only be tamely ramified, the proof of Theorem 2.2 is relatively easy.) Recent work of Lehr and

Then according to the remark in Section 8.2.3, the statement of Theorem 3.8 holds if the tensor product M ( z ) is considered for generic z and generic (not necessarily diagonal)

In Diagram 10.1, the bulb only lights up in if both of the switches A and B are closed, hence there is only ONE chance for current to flow to give only ONE output of the logic

Place identity is an important dimension of social and cultural life in urban areas and continuity of place identity is strongly linked to place attachment and sense of belonging.. In

Finally, by using bipolar intuitionistic fuzzy matrixconclude that patient p3 suffering stomach ulcer p1 and p2 patients sufferingfrom disease typhoid.. BIFMS FOR DISEASE DIAGNOSIS

A regular LDPC code has constant number of ones in each column column weight and each row row weight in the parity-check matrix H.. If column weight and row- weight are not constant,