Matrices

(1)

14

Matrices

In this chapter, we discuss basic definitions and results concerning matrices. We shall start out with a very general point of view, discussing matrices whose entries lie in an arbitrary ringR. Then we shall specialize to the case where the entries lie in a fieldF, where much more can be said.

One of the main goals of this chapter is to discuss “Gaussian elimination,” which is an algorithm that allows us to efficiently compute bases for the image and kernel of anF-linear map.

In discussing the complexity of algorithms for matrices over a ringR, we shall treat a ringRas an “abstract data type,” so that the running times of algorithms will be stated in terms of the number of arithmetic operations inR. IfRis a finite ring, such asZm, we can immediately translate this into a running time on a RAM (in later chapters, we will discuss other finite rings and efficient algorithms for doing arithmetic in them).

If R is, say, the field of rational numbers, a complete running time analysis would require an additional analysis of the sizes of the numbers that appear in the execution of the algorithm. We shall not attempt such an analysis here — however, we note that all the algorithms discussed in this chapter do in fact run in polynomial time whenR=Q, assuming we represent rational numbers as fractions in lowest terms. Another possible approach for dealing with rational numbers is to use floating point approximations. While this approach eliminates the size problem, it creates many new problems because of round-offerrors. We shall not address any of these issues here.

14.1 Basic definitions and properties Throughout this section,Rdenotes a ring.

For positive integersmandn, anm×nmatrixAover a ringRis a rectangular

377

(2)

array

A=







a₁₁ a₁₂ · · · a_1n a₂₁ a₂₂ · · · a_2n ... ... ... a_m1 a_m2 · · · amn





 ,

where each entrya_ij in the array is an element ofR; the elementa_ij is called the (i,j)entryofA, which we denote byA(i,j). Fori=1, . . . ,m, theith rowofAis

(a_i1, . . . ,ain),

which we denote by Rowi(A), and forj =1, . . . ,n, thejth columnofAis





 a_1j a_2j ... amj





 ,

which we denote by Colj(A). We regard a row ofAas a 1×nmatrix, and a column ofAas anm×1 matrix.

The set of all m×n matrices over R is denoted byR^m×n. Elements of R^1×n are called row vectors (of dimension n) and elements of R^m×1 are called column vectors (of dimensionm). Elements ofR^n×nare calledsquare matrices (of dimensionn). We do not make a distinction betweenR^1×n andR^×n; that is, we view standardn-tuples as row vectors.

We can define the familiar operations ofmatrix additionandscalar multiplication:

• If A,B ∈ R^m×n, then A+ B is the m× n matrix whose (i,j) entry is A(i,j)+B(i,j).

• Ifc ∈R andA ∈R^m×n, thencAis them×nmatrix whose (i,j) entry is cA(i,j).

Them×nzero matrixis them×nmatrix, all of whose entries are 0R; we denote this matrix by 0^m×n_R (or just 0, when the context is clear).

Theorem 14.1. With addition and scalar multiplication as defined above,R^m×nis an R-module. The matrix 0^m×n_R is the additive identity, and the additive inverse of a matrixA∈R^m×nis them×nmatrix whose (i,j)entry is−A(i,j).

Proof. To prove this, one first verifies that matrix addition is associative and commutative, which follows from the associativity and commutativity of addition inR.

The claims made about the additive identity and additive inverses are also easily

(3)

verified. These observations establish thatR^m×nis an abelian group. One also has to check that all of the properties in Definition 13.1 hold. We leave this to the reader. 2

We can also define the familiar operation ofmatrix multiplication:

• IfA∈R^m×nandB∈R^n×p, thenABis them×pmatrix whose (i,k) entry is

n

X

j=1

A(i,j)B(j,k).

The n × n identity matrix is the matrix I ∈ R^n×n, where I(i,i) := 1R and I(i,j) := 0R for i 6= j. That is, I has 1R’s on the diagonal that runs from the upper left corner to the lower right corner, and 0R’s everywhere else.

Theorem 14.2.

(i) Matrix multiplication is associative; that is, A(BC) = (AB)C for all A∈R^m×n,B∈R^n×p, andC ∈R^p×q.

(ii) Matrix multiplication distributes over matrix addition; that is,A(C+D) = AC+ADand(A+B)C =AC+BCfor allA,B ∈R^m×nandC,D∈R^n×p. (iii) The n×nidentity matrix I ∈ R^n×nacts as a multiplicative identity; that is, AI = Aand IB = B for all A ∈R^m×nand B ∈R^n×m; in particular, CI =C=IC for allC ∈R^n×n.

(iv) Scalar multiplication and matrix multiplication associate; that is,c(AB) = (cA)B =A(cB)for all c∈R,A∈R^m×n, andB ∈R^n×p.

Proof. All of these are trivial, except for (i), which requires just a bit of compu- tation to show that the (i,`) entry of bothA(BC) and (AB)C is equal to (as the reader may verify)

X

1≤j≤n 1≤k≤p

A(i,j)B(j,k)C(k,`). 2

Note that while matrix addition is commutative, matrix multiplication in general is not. Indeed, Theorems 14.1 and 14.2 imply thatR^n×nsatisfies all the properties of a ring except for commutativity of multiplication.

Some simple but useful facts to keep in mind are the following:

• If A ∈ R^m×n and B ∈ R^n×p, then the ith row of AB is equal to vB, where v = Rowi(A); also, thekth column of AB is equal toAw, where w =Colk(B).

(4)

• IfA∈R^m×nandv =(c₁, . . . ,c_m) ∈R^1×m, then vA=

m

X

i=1

ciRowi(A).

In words: vAis a linear combination of the rows of A, with coefficients taken from the corresponding entries ofv.

• IfA∈R^m×nand

w =





 d₁

... d_n





∈R^n×1, then

Aw=

n

X

j=1

djColj(A).

In words:Awis a linear combination of the columns ofA, with coefficients taken from the corresponding entries ofw.

If A ∈ R^m×n, thetranspose of A, denoted by A , is defined to be the n×m matrix whose (j,i) entry isA(i,j).

Theorem 14.3. If A,B∈R^m×n,C∈R^n×p, and c∈R, then:

(i) (A+B) =A +B ; (ii) (cA) =cA ;

(iii) (A ) =A;

(iv) (AC) =C A . Proof. Exercise. 2

If Ai is an ni×ni+1 matrix, for i = 1, . . . ,k, then by associativity of matrix multiplication, we may write the product matrixA₁· · ·Ak, which is ann₁×nk+1

matrix, without any ambiguity.

For ann×nmatrixA, and a positive integerk, we writeA^kto denote the product A· · ·A, where there arekterms in the product. Note thatA¹ =A. We may extend this notation tok=0, definingA⁰to be then×nidentity matrix. One may readily verify the usual rules of exponent arithmetic: for all non-negative integersk,`, we have

(A^`)^k =A^k^`=(A^k)^` and A^kA^`=A^k⁺^`.

It is easy also to see that part (iv) of Theorem 14.3 implies that for all non-negative

(5)

integersk, we have

(A^k) =(A )^k.

Algorithmic issues

For computational purposes, matrices are represented in the obvious way as arrays of elements ofR. As remarked at the beginning of this chapter, we shall treatRas an “abstract data type,” and not worry about how elements ofRare actually represented; in discussing the complexity of algorithms, we shall simply count “operations inR,” by which we mean additions, subtractions, and multiplications; we shall sometimes also include equality testing and computing multiplicative inverses as “operations inR.” In any real implementation, there will be other costs, such as incrementing counters, and so on, which we may safely ignore, as long as their number is at most proportional to the number of operations inR.

The following statements are easy to verify:

• We can multiply anm×nmatrix by a scalar usingmnoperations inR.

• We can add twom×nmatrices usingmnoperations inR.

• We can compute the product of anm×nmatrix and ann×pmatrix using O(mnp) operations inR.

It is also easy to see that given ann×nmatrixA, and a non-negative integere, we can adapt the repeated squaring algorithm discussed in §3.4 so as to computeA^e usingO(len(e)) multiplications ofn×nmatrices, and henceO(len(e)n³) operations inR.

EXERCISE 14.1. LetA ∈R^m×n. Show that ifvA = 0^1×n_R for allv ∈ R^1×m, then A=0^m×n_R .

14.2 Matrices and linear maps LetRbe a ring.

For positive integersmandn, consider theR-modulesR^1×m andR^1×n. IfAis anm×nmatrix overR, then the map

λ_A: R^1×m →R^1×n v7→vA

is easily seen to be anR-linear map — this follows immediately from parts (ii) and (iv) of Theorem 14.2. We callλ_Athelinear map corresponding toA.

(6)

Ifv =(c₁, . . . ,c_m)∈R^1×m, then λA(v)=vA=

m

X

i=1

ciRowi(A).

From this, it is clear that

• the image ofλA is the submodule ofR^1×n spanned by {Rowi(A)}^m_i₌₁; in particular,λAis surjective if and only if{Rowi(A)}^m_i₌₁spansR^1×n;

• λ_Ais injective if and only if{Rowi(A)}^m_i₌₁is linearly independent.

There is a close connection between matrix multiplication and composition of corresponding linear maps. Specifically, letA∈R^m×nandB ∈R^n×p, and consider the corresponding linear mapsλA :R^1×m → R^1×nandλB : R^1×n →R^1×p. Then we have

λB◦λA=λAB. (14.1)

This follows immediately from the associativity of matrix multiplication.

We have seen how vector/matrix multiplication defines a linear map. Conversely, we shall now see that the action of any R-linear map can be viewed as a vector/matrix multiplication, provided theR-modules involved have bases (which will always be the case for finite dimensional vector spaces).

LetM be anR-module, and suppose thatS = {α_i}^m_i₌₁ is a basis forM, where m > 0. As we know (see Theorem 13.14), every elementα ∈ M can be written uniquely asc₁α₁+· · ·+c_mα_m, where thec_i’s are inR. Let us define

VecS(α) :=(c₁, . . . ,cm)∈R^1×m.

We call VecS(α) thecoordinate vector ofαrelative toS. The function VecS :M →R^1×m

is an R-module isomorphism (it is the inverse of the isomorphism ε in Theo- rem 13.14).

Let N be another R-module, and suppose that T = {βj}ⁿ_j₌₁ is a basis for N, wheren >0. Just as in the previous paragraph, every elementβ∈Nhas a unique coordinate vector VecT(β)∈R^1×nrelative toT.

Now letρ:M →Nbe an arbitraryR-linear map. Our goal is to define a matrix A∈R^m×nwith the following property:

VecT(ρ(α))=VecS(α)A for allα∈M. (14.2) In words: if we multiply the coordinate vector ofαon the right byA, we get the coordinate vector ofρ(α).

(7)

Constructing such a matrix Ais easy: we define Ato be the matrix whoseith row, fori=1, . . . ,m, is the coordinate vector ofρ(αi) relative toT. That is,

Rowi(A)=VecT(ρ(αi)) fori=1, . . . ,m.

Then for an arbitraryα∈M, if (c1, . . . ,cm) is the coordinate vector ofαrelative to S, we have

ρ(α) =ρX

i

c_iα_i

=X

i

c_iρ(αi) and so

VecT(ρ(α))=X

i

ciVecT(ρ(αi))=X

i

ciRowi(A) =VecS(α)A.

Furthermore,Ais the only matrix satisfying (14.2). Indeed, ifA⁰ also satisfies (14.2), then subtracting, we obtain

VecS(α)(A−A⁰)=0^1×n_R

for allα ∈ M. Since the map VecS : M → R^1×m is surjective, this means that v(A−A⁰) is zero for allv∈R^1×m, and from this, it is clear (see Exercise 14.1) that A−A⁰is the zero matrix, and soA=A⁰.

We call the unique matrixAsatisfying (14.2) thematrix ofρrelative toS and T, and denote it by Mat_S,T(ρ).

Recall that HomR(M,N) is theR-module consisting of allR-linear maps from M toN (see Theorem 13.12). We can view Mat_S,T as a function mapping elements of HomR(M,N) to elements ofR^m×n.

Theorem 14.4. The function MatS,T : HomR(M,N) → R^m×n is an R-module isomorphism. In particular, for everyA∈R^m×n, the pre-image of AunderMatS,T

isVec⁻¹_T ◦λ_A◦VecS, where λ_A:R^1×m →R^1×nis the linear map corresponding to A.

Proof. To show that MatS,T is anR-linear map, letρ,ρ⁰ ∈HomR(M,N), and let c ∈R. Also, letA:= Mat_S,T(ρ) andA⁰ := Mat_S,T(ρ⁰). Then for allα ∈M, we have

VecT((ρ+ρ⁰)(α))=VecT(ρ(α)+ρ⁰(α))=VecT(ρ(α))+VecT(ρ⁰(α))

=VecS(α)A+VecS(α)A⁰ =VecS(α)(A+A⁰).

As this holds for all α ∈ M, and since the matrix of a linear map is uniquely determined, we must have Mat_S,T(ρ+ρ⁰) = A+A⁰. A similar argument shows that Mat_S,T(cρ)=cA. This shows that Mat_S,T is anR-linear map.

To show that the map Mat_S,T is injective, it suffices to show that its kernel is

(8)

trivial. Ifρis in the kernel of this map, then settingA := 0^m×n_R in (14.2), we see that VecT(ρ(α)) is zero for allα ∈ M. But since the map VecT : N → R^1×n is injective, this impliesρ(α) is zero for allα∈M. Thus,ρmust be the zero map.

To show surjectivity, we show that every A ∈ R^m×n has a pre-image under MatS,T as described in the statement of the theorem. So letAbe anm×nmatrix, and letρ:=Vec⁻¹_T ◦λA◦VecS. Again, since the matrix of a linear map is uniquely determined, it suffices to show that (14.2) holds for this particularAandρ. For everyα∈M, we have

VecT(ρ(α))=VecT(Vec⁻¹_T (λA(VecS(α))))=λA(VecS(α))

=VecS(α)A.

That proves the theorem. 2

As a special case of the above, suppose thatM =R^1×m andN = R^1×n, andS andT are the standard bases forM andN (see Example 13.27). In this case, the functions VecS and VecT are the identity maps, and the previous theorem implies that the function

Λ: R^m×n→HomR(R^1×m,R^1×n) A7→λA

is the inverse of the function Mat_S,T : HomR(R^1×m,R^1×n) → R^m×n. Thus, the functionΛis also anR-module isomorphism.

To summarize, we see that an R-linear map ρ from M to N, together with particular bases forM andN, uniquely determine a matrixAsuch that the action of multiplication on the right byAimplements the action ofρwith respect to the given bases. There may be many bases forM andNto choose from, and different choices will in general lead to different matrices. Also, note that in general, a basis may be indexed by an arbitrary finite set; however, in defining coordinate vectors and matrices of linear maps, the index set must be ordered in some way. In any case, from a computational perspective, the matrixAgives us an efficient way to compute the mapρ, assuming elements ofM andN are represented as coordinate vectors with respect to the given bases.

We have taken a “row-centric” point of view. Of course, if one prefers, by simply transposing everything, one can equally well take a “column-centric” point of view, where the action ofρcorresponds to multiplication of a column vector on the left by a matrix.

Example 14.1. Consider the quotient ringE =R[X]/(f), wheref ∈ R[X] with deg(f) = ` > 0 and lc(f) ∈ R^∗. Let ξ := [X]f ∈ E. As an R-module, E has a basis S := {ξⁱ⁻¹}^`_i₌₁ (see Example 13.30). Let ρ : E → E be the ξ- multiplication map, which sends α ∈ E toξα ∈ E. This is anR-linear map. If

(9)

f =c₀+c₁X+· · ·+c_`₋₁X^`⁻¹+c_`X^`, then the matrix ofρrelative toS is the`×_` matrix

A=







0 1 0 · · · 0

0 0 1 · · · 0

. ..

0 0 0 · · · 1

−c₀/c_` −c₁/c_` −c₂/c_` · · · −c_`₋₁/c_`





 ,

where fori = 1, . . . ,_`−1, theith row ofAcontains a 1 in positioni+1, and is zero everywhere else. The matrixAis called thecompanion matrix off. 2 Example 14.2. Letx₁, . . . ,xk ∈R. LetR[X]<kbe the set of polynomialsg∈R[X] with deg(g) < k, which is anR-module with a basisS := {Xⁱ⁻¹^}^k_i₌₁ (see Exam- ple 13.29). The multi-point evaluation map

ρ: R[X]<k→R^1×k

g7→(g(x1), . . . ,g(xk))

is anR-linear map. LetT be the standard basis forR^1×k. Then the matrix of ρ relative toS andT is thek×kmatrix

A=







1 1 · · · 1 x₁ x₂ · · · x_k x²₁ x²₂ · · · x²_k ... ... ... x^k−1₁ x^k−1₂ · · · x^k−1_k





 .

The matrixAis called aVandermonde matrix. 2

EXERCISE 14.2. Letσ :M →Nandτ :N →P beR-linear maps, and suppose thatM,N, andP have basesS,T, andU, respectively. Show that

Mat_S,U(τ◦σ)=Mat_S,T(σ)·MatT,U(τ).

EXERCISE 14.3. Let V be a vector space over a fieldF with basisS = {α_i}^m_i₌₁. Suppose that U is a subspace ofV of dimension` < m. Show that there exists a matrixA ∈ F^m×(m−^`⁾ such that for all α ∈ V, we have α ∈ U if and only if VecS(α)Ais zero. Such a matrixAis called aparity check matrixforU.

EXERCISE14.4. LetF be a finite field, and letAbe a non-zerom×nmatrix over F. Suppose one chooses a vectorv ∈F^1×m at random. Show that the probability thatvAis the zero vector is at most 1/|F|.

(10)

EXERCISE 14.5. Design and analyze a probabilistic algorithm that takes as input matrices A,B,C ∈ Z^m×mp , where pis a prime. The algorithm should run in time O(m²len(p)²) and should output either “yes” or “no” so that the following holds:

• ifC =AB, then the algorithm should always output “yes”;

• ifC 6=AB, then the algorithm should output “no” with probability at least 0.999.

14.3 The inverse of a matrix LetRbe a ring.

For a square matrixA ∈ R^n×n, we call a matrix B ∈ R^n×n an inverseofAif BA=AB =I, whereI is then×nidentity matrix. It is easy to see that ifAhas an inverse, then the inverse is unique: ifBandCare inverses ofA, then

B=BI =B(AC)=(BA)C =IC =C.

Because the inverse ofAis uniquely determined, we denote it byA⁻¹. IfAhas an inverse, we say thatAisinvertible, or non-singular. IfAis not invertible, it is sometimes calledsingular. We will use the terms “invertible” and “not invertible.”

Observe thatAis the inverse ofA⁻¹; that is, (A⁻¹)⁻¹=A.

IfAandB are invertiblen×nmatrices, then so is their product: in fact, it is easily verified that (AB)⁻¹ =B⁻¹A⁻¹. It follows that ifAis an invertible matrix, andkis a non-negative integer, then A^k is invertible with inverse (A⁻¹)^k, which we also denote byA^−k.

It is also easy to see thatAis invertible if and only if the transposed matrixA is invertible, in which case (A )⁻¹ =(A⁻¹) . Indeed,AB =I = BAholds if and only ifB A =I=A B .

We now develop a connection between invertible matrices and R-module iso- morphisms. Recall from the previous section theR-module isomorphism

Λ: R^n×n→HomR(R^1×n,R^1×n) A7→λA,

where for eachA∈R^n×n,λAis the correspondingR-linear map λ_A: R^1×n→R^1×n

v 7→vA.

Evidently,λ_Iis the identity map.

Theorem 14.5. Let A ∈R^n×n, and let λ_A :R^1×n → R^1×n be the corresponding R-linear map. Then A is invertible if and only if λ_A is bijective, in which case λ_A⁻¹ =λ⁻¹_A .

(11)

Proof. SupposeAis invertible, and thatBis its inverse. We haveAB= BA= I, and henceλ_AB = λ_BA = λ_I, from which it follows (see (14.1)) thatλ_B ◦λ_A = λ_A◦λ_B =λ_I. Sinceλ_I is the identity map, this impliesλ_Ais bijective.

Suppose λ_A is bijective. We know that the inverse mapλ⁻¹_A is also anR-linear map, and since the mapping Λabove is surjective, we have λ⁻¹_A = λB for some B ∈ R^n×n. Therefore, we have λB ◦λA = λA◦λB = λI, and hence (again, see (14.1)) λAB = λBA = λI. Since the mapping Λis injective, it follows that AB=BA=I. This impliesAis invertible, withA⁻¹=B. 2

We also have:

Theorem 14.6. LetA∈R^n×n. The following are equivalent:

(i) Ais invertible;

(ii) {Rowi(A)}ⁿ_i₌₁is a basis for R^1×n; (iii) {Colj(A)}ⁿ_j₌₁is a basis for R^n×1.

Proof. We first prove the equivalence of (i) and (ii). By the previous theorem, A is invertible if and only if λA is bijective. Also, in the previous section, we observed thatλAis surjective if and only if{Rowi(A)}ⁿ_i₌₁spansR^1×n, and thatλA

is injective if and only if{Rowi(A)}ⁿ_i₌₁is linearly independent.

The equivalence of (i) and (iii) follows by considering the transpose ofA. 2 EXERCISE 14.6. LetRbe a ring, and letAbe a square matrix overR. Let us call Baleft inverseofAifBA=I, and let us callCaright inverseofAifAC =I. (a) Show that ifAhas both a left inverseBand a right inverseC, thenB =C

and henceAis invertible.

(b) Assume thatRis a field. Show that ifAhas either a left inverse or a right inverse, thenAis invertible.

Note that part (b) of the previous exercise holds for arbitrary rings, but the proof of this is non-trivial, and requires the development of the theory of determinants, which we do not cover in this text.

EXERCISE 14.7. Show that ifAandB are two square matrices over a field such that their productABis invertible, then bothAandBthemselves must be invertible.

EXERCISE 14.8. Show that ifAis a square matrix over an arbitrary ring, andA^k is invertible for somek >0, thenAis invertible.

EXERCISE 14.9. With notation as in Example 14.1, show that the matrixA is invertible if and only ifc₀∈R^∗.

(12)

EXERCISE 14.10. With notation as in Example 14.2, show that the matrixAis invertible if and only ifx_i−x_j∈R^∗for alli6=j.

14.4 Gaussian elimination Throughout this section,F denotes a field.

A matrixB ∈F^m×nis said to be inreduced row echelon formif there exists a sequence of integers (p₁, . . . ,pr), with 0≤r≤mand 1≤p₁< p₂<· · ·< pr ≤n, such that the following holds:

• fori=1, . . . ,r, all of the entries in rowiofBto the left of entry (i,p_i) are zero; that is,B(i,j)=0F forj =1, . . . ,pi−1;

• fori = 1, . . . ,r, all of the entries in columnp_iofB above entry (i,p_i) are zero; that is,B(i⁰,p_i) =0F fori⁰=1, . . . ,i−1;

• fori=1, . . . ,r, we haveB(i,pi)=1F;

• all entries in rows r+ 1, . . . ,m of B are zero; that is, B(i,j) = 0F for i=r+1, . . . ,mandj=1, . . . ,n.

It is easy to see that if B is in reduced row echelon form, then the sequence (p₁, . . . ,p_r) above is uniquely determined, and we call it thepivot sequenceofB.

Several further remarks are in order:

• All of the entries ofB are completely determined by the pivot sequence, except for the entries (i,j) with 1≤i≤randj > piwithj /∈ {pi+1, . . . ,pr}, which may be arbitrary.

• IfBis ann×nmatrix in reduced row echelon form whose pivot sequence is of lengthn, thenB must be then×nidentity matrix.

• We allow for an empty pivot sequence (i.e.,r= 0), which will be the case precisely whenB=0^m×n_F .

Example 14.3. The following 4×6 matrixBover the rational numbers is in reduced row echelon form:

B =







0 1 −2 0 0 3

0 0 0 1 0 2

0 0 0 0 1 −4

0 0 0 0 0 0





 .

The pivot sequence ofB is (2, 4, 5). Notice that the first three rows ofB form a linearly independent family of vectors, that columns 2, 4, and 5 form a linearly independent family of vectors, and that all of other columns ofB are linear combinations of columns 2, 4, and 5. Indeed, if we truncate the pivot columns to their first three rows, we get the 3×3 identity matrix. 2

(13)

Generalizing the previous example, if a matrix is in reduced row echelon form, it is easy to deduce the following properties, which turn out to be quite useful:

Theorem 14.7. If B is a matrix in reduced row echelon form with pivot sequence (p1, . . . ,p_r), then:

(i) rows 1, 2, . . . ,rof Bform a linearly independent family of vectors;

(ii) columnsp₁, . . . ,prof Bform a linearly independent family of vectors, and all other columns of Bcan be expressed as linear combinations of columns p₁, . . . ,pr.

Proof. Exercise — just look at the matrix! 2

Gaussian eliminationis an algorithm that transforms a given matrixA∈F^m×n into a matrixB ∈F^m×n, whereBis in reduced row echelon form, and is obtained fromAby a sequence of elementary row operations. There are three types of elementary row operations:

Type I: swap two rows;

Type II: multiply a row by a non-zero scalar;

Type III: add a scalar multiple of one row to a different row.

The application of any specific elementary row operation to an m×n matrix C can be affected by multiplying C on the left by a suitable m×m matrix X.

Indeed, the matrix X corresponding to a particular elementary row operation is simply the matrix obtained by applying the same elementary row operation to the m×midentity matrix. It is easy to see that for every elementary row operation, the corresponding matrixXis invertible.

We now describe the basic version of Gaussian elimination. The input is anm×n matrixA, and the algorithm is described in Fig. 14.1.

The algorithm works as follows. First, it makes a copyBofA(this is not neces- sary if the original matrixAis not needed afterwards). The algorithm proceeds column by column, starting with the left-most column, so that after processing column j, the firstj columns ofBare in reduced row echelon form, and the current value ofrrepresents the length of the pivot sequence. To process columnj, in steps 3–6 the algorithm first searches for a non-zero element amongB(r+1,j), . . . ,B(m,j);

if none is found, then the first j +1 columns of B are already in reduced row echelon form. Otherwise, one of these non-zero elements is selected as thepivot element(the choice is arbitrary), which is then used in steps 8–13 to bring column j into the required form. After incrementingr, the pivot element is brought into position (r,j), using a Type I operation in step 9. Then the entry (r,j) is set to 1F, using a Type II operation in step 10. Finally, all the entries above and below entry (r,j) are set to 0F, using Type III operations in steps 11–13. Note that because

(14)

1. B ←A,r←0 2. forj ←1 tondo 3. `←0,i←r

4. while`=0 andi≤mdo

5. i←i+1

6. ifB(i,j)6=0F then`←i 7. if`6=0 then

8. r←r+1

9. swap rowsrand`ofB

10. Rowr(B) ←B(r,j)⁻¹Rowr(B)

11. fori←1 tomdo

12. ifi6=rthen

13. Rowi(B)←Rowi(B)−B(i,j) Rowr(B) 14. outputB

Fig. 14.1. Gaussian elimination

columns 1, . . . ,j−1 ofBwere already in reduced row echelon form, none of these operations changes any values in these columns.

As for the complexity of the algorithm, it is easy to see that it performsO(mn) elementary row operations, each of which takesO(n) operations inF, so a total of O(mn²) operations inF.

Example 14.4. Consider the execution of the Gaussian elimination algorithm on input

A=





[0] [1] [1]

[2] [1] [2]

[2] [2] [0]



∈Z^3×3₃ . After copyingAintoB, the algorithm transformsBas follows:





[0] [1] [1]

[2] [1] [2]

[2] [2] [0]





Row1↔Row2

−−−−−−−−−→





[2] [1] [2]

[0] [1] [1]

[2] [2] [0]





Row1←[2] Row1

−−−−−−−−−−−→





[1] [2] [1]

[0] [1] [1]

[2] [2] [0]





Row3←Row3−[2] Row1

−−−−−−−−−−−−−−−−→





[1] [2] [1]

[0] [1] [1]





−−−−−−−−−−−−−−−−→





[1] [0] [2]

[0] [1] [1]





(15)

Row3←Row3−Row2

−−−−−−−−−−−−−−→





[1] [0] [2]

[0] [1] [1]

[0] [0] [0]



 . 2

Suppose the Gaussian elimination algorithm performs a total of t elementary row operations. Then as discussed above, the application of theeth elementary row operation, fore = 1, . . . ,t, amounts to multiplying the current value of the matrixBon the left by a particular invertiblem×mmatrixXe. Therefore, the final output value ofBsatisfies the equation

B =XA where X =X_tX_t−1· · ·X₁.

Since the product of invertible matrices is also invertible, we see that X itself is invertible.

Although the algorithm as presented does not compute the matrix X, it can be easily modified to do so. The resulting algorithm, which we call extended Gaussian elimination, is the same as plain Gaussian elimination, except that we initialize the matrixX to be them×midentity matrix, and we add the following steps:

• just before step 9: swap rowsrand_`ofX;

• just before step 10: Rowr(X)←B(r,j)⁻¹Rowr(X);

• just before step 13: Rowi(X) ←Rowi(X)−B(i,j) Rowr(X).

At the end of the algorithm we outputXin addition toB.

So we simply perform the same elementary row operations onXthat we perform onB. The reader may verify that the above algorithm is correct, and that it uses O(mn(m+n)) operations inF.

Example 14.5. Continuing with Example 14.4, the execution of the extended Gaussian elimination algorithm initializesXto the identity matrix, and then trans- formsXas follows:





[1] [0] [0]

[0] [1] [0]

[0] [0] [1]





Row1↔Row2

−−−−−−−−−→





[0] [1] [0]

[1] [0] [0]

[0] [0] [1]





Row1←[2] Row1

−−−−−−−−−−−→





[0] [2] [0]

[1] [0] [0]

[0] [0] [1]





−−−−−−−−−−−−−−−−→





[0] [2] [0]

[1] [0] [0]

[0] [2] [1]





−−−−−−−−−−−−−−−−→





[1] [2] [0]

[1] [0] [0]

[0] [2] [1]





(16)

Row3←Row3−Row2

−−−−−−−−−−−−−−→





[1] [2] [0]

[1] [0] [0]

[2] [2] [1]



 . 2

EXERCISE14.11. For each type of elementary row operation, describe the matrix Xwhich corresponds to it, as well asX⁻¹.

EXERCISE 14.12. Given a matrixB ∈F^m×nin reduced row echelon form, show how to compute its pivot sequence usingO(n) operations inF.

EXERCISE 14.13. In §4.4, we saw how to speed up matrix multiplication overZ using the Chinese remainder theorem. In this exercise, you are to do the same, but for performing Gaussian elimination overZp, wherepis a large prime. Suppose you are given anm×mmatrixAoverZp, where len(p) = Θ(m). Straightforward application of Gaussian elimination would requireO(m³) operations inZp, each of which takes timeO(m²), leading to a total running time ofO(m⁵). Show how to use the techniques of §4.4 to reduce the running time of Gaussian elimination to O(m⁴).

14.5 Applications of Gaussian elimination

Throughout this section,Ais an arbitrarym×nmatrix over a fieldF, andXA=B, whereXis an invertiblem×mmatrix, andB is anm×nmatrix in reduced row echelon form with pivot sequence (p₁, . . . ,pr). This is precisely the information produced by the extended Gaussian elimination algorithm, givenAas input (the pivot sequence can easily be “read” directly fromB— see Exercise 14.12). Also, let

λ_A: F^1×m →F^1×n v7→vA be the linear map corresponding toA.

Computing the image and kernel

Consider first therow spaceofA, by which we mean the subspace ofF^1×nspanned by{Rowi(A)}^m_i₌₁, or equivalently, the image ofλA.

We claim that the row space ofAis the same as the row space ofB. To see this, note that sinceB =XA, for everyv∈F^1×m, we havevB =v(XA)=(vX)A, and so the row space ofBis contained in the row space ofA. For the other containment, note that since X is invertible, we can write A = X⁻¹B, and apply the same argument.

(17)

Further, note that the row space ofB, and hence that ofA, clearly has dimension r. Indeed, as stated in Theorem 14.7, rows 1, . . . ,rofB form a basis for the row space ofB.

Consider next the kernel K of λ_A, or what we might call therow null space of A. We claim that {Rowi(X)}^m_i₌_r₊₁ is a basis for K. Clearly, just from the fact that XA = B and the fact that rows r +1, . . . ,mof B are zero, it follows that rowsr+1, . . . ,mofX are contained inK. Furthermore, asX is invertible, {Rowi(X)}^m_i₌₁is a basis forF^1×m(see Theorem 14.6). Thus, the family of vectors {Rowi(X)}^m_i₌_r₊₁is linearly independent and spans a subspaceK⁰ofK. It suffices to show thatK⁰ = K. Suppose to the contrary thatK⁰ (K, and letv ∈ K \K⁰. As{Rowi(X)}^m_i₌₁spansF^1×m, we may writev =Pm

i=1ciRowi(X); moreover, as v /∈K⁰, we must haveci 6=0F for somei=1, . . . ,r. Setting ˜v := (c₁, . . . ,cm), we see thatv=vX, and so˜

λA(v)=vA=( ˜vX)A=v˜(XA)=vB.˜

Furthermore, since{Rowi(B)}^r_i₌₁ is linearly independent, rowsr+1, . . . ,mofB are zero, and ˜vhas a non-zero entry in one of its firstrpositions, we see that ˜vBis not the zero vector. We have derived a contradiction, and hence may conclude that K⁰=K.

Finally, note that ifm = n, thenAis invertible if and only if its row space has dimension m, which holds if and only if r = m, and in the latter case, B is the identity matrix, and henceXis the inverse ofA.

Let us summarize the above discussion:

• The firstrrows ofBform a basis for the row space ofA(i.e., the image of λA).

• The lastm−rrows ofXform a basis for the row null space ofA(i.e., the kernel ofλA).

• Ifm = n, thenAis invertible (i.e., λA is an isomorphism) if and only if r =m, in which caseXis the inverse ofA(i.e., the matrix ofλ⁻¹_A relative to the standard basis).

So we see that from the output of the extended Gaussian elimination algorithm, we can simply “read off” bases for both the image and the kernel, as well as the inverse (if it exists), of a linear map represented as a matrix with respect to given bases. Also note that this procedure provides a “constructive” version of Theo- rem 13.28.

Example 14.6. Continuing with Examples 14.4 and 14.5, we see that the vectors ([1], [0], [2]) and ([0], [1], [1]) form a basis for the row space ofA, while the vector ([2], [2], [1]) is a basis for the row null space ofA. 2

(18)

Solving systems of linear equations

Suppose that in addition to the matrixA, we are givenw∈F^1×n, and want to find a solutionv ∈F^1×m (or perhaps describe all solutions), to the equation

vA=w. (14.3)

Equivalently, we can phrase the problem as finding an element (or describing all elements) of the setλ⁻¹_A ({w}).

Now, if there exists a solution at all, sayv ∈F^1×m, thenλA(v) = λA(v⁰) if and only ifv ≡v⁰ (modK), whereK is the kernel ofλ_A. It follows that the set of all solutions to (14.3) isv+K = {v+v₀ :v₀ ∈K}. Thus, given a basis forK and any solutionv to (14.3), we have a complete and concise description of the set of solutions to (14.3).

As we have discussed above, the lastm−rrows ofX form a basis forK, so it suffices to determine ifw∈ImλA, and if so, determine a single pre-imagev ofw.

Also as we discussed, ImλA, that is, the row space ofA, is equal to the row space ofB, and because of the special form ofB, we can quickly and easily determine if the givenwis in the row space ofB, as follows. By definition,wis in the row space ofBif and only if there exists a vectorv ∈F^1×msuch thatvB =w. We may as well assume that all but the firstrentries ofv are zero. Moreover,vB =wimplies that fori=1, . . . ,r, theith entry ofvis equal to thepith entry ofw. Thus, the vectorv, if it exists, is completely determined by the entries ofwat positionsp₁, . . . ,pr. We can constructvsatisfying these conditions, and then test ifvB =w. If not, then we may conclude that (14.3) has no solutions; otherwise, settingv :=vX, we see that vA=(vX)A=v(XA)=vB =w, and sovis a solution to (14.3).

One easily verifies that if we implement the above procedure as an algorithm, the work done in addition to running the extended Gaussian elimination algorithm amounts toO(m(n+m)) operations inF.

A special case of the above procedure is when m = n andAis invertible, in which case (14.3) has a unique solution, namely, v := wX, since in this case, X=A⁻¹.

The rank of a matrix

We define therow rankofAto be the dimension of its row space, which is equal to dimF(ImλA). Thecolumn spaceofAis defined as the subspace ofF^m×1spanned by{Colj(A)}ⁿ_j₌₁; that is, the column space ofAis{Az:z∈F^n×1}. Thecolumn rankofAis the dimension of its column space.

Now, the column space ofAneed not be the same as the column space ofB, but from the identityB = XA, and the fact thatXis invertible, it easily follows that these two subspaces are isomorphic (via the map that sendsy ∈F^m×1toXy), and

(19)

hence have the same dimension. Moreover, by Theorem 14.7, the column rank of Bisr, which is the same as the row rank ofA.

So we may conclude: The column rank and row rank of Aare the same.

Because of this, we may define therankof a matrix to be the common value of its row and column rank.

The orthogonal complement of a subspace

So as to give equal treatment to rows and columns, one can also define thecolumn null spaceofAto be the kernel of the linear map defined by multiplication on the left byA; that is, the column null space of Ais{z ∈ F^n×1 : Az = 0^m×1_F }. By applying the results above to the transpose ofA, we see that the column null space ofAhas dimensionn−r, whereris the rank ofA.

Let U ⊆ F^1×n be the row space of A, and let U^⊥ ⊆ F^1×n denote the set of all vectorsu ∈ F^1×n whose transposeu belongs to the column null space ofA.

Now, U is a subspace of F^1×n of dimensionr andU^⊥ is a subspace of F^1×n of dimensionn−r. The spaceU^⊥consists precisely of all vectorsu∈F^1×nthat are

“orthogonal” to all vectorsu∈U, in the sense that the “inner product”uu is zero.

For this reason,U^⊥is sometimes called the “orthogonal complement ofU.”

Clearly,U^⊥is determined by the subspaceUitself, and does not depend on the particular choice of matrixA. It is also easy to see that the orthogonal complement of U^⊥ is U; that is, (U^⊥)^⊥ = U. This follows immediately from the fact that U ⊆(U^⊥)^⊥and dimF((U^⊥)^⊥) =n−dimF(U^⊥)=dimF(U).

Now suppose thatU∩U^⊥={0}. Then by Theorem 13.11, we have an isomorphism ofU×U^⊥withU+U^⊥, and sinceU×U^⊥has dimensionn, it must be the case thatU+U^⊥ =F^1×n. It follows that every element ofF^1×ncan be expressed uniquely asu+u, whereu∈Uandu∈U^⊥.

We emphasize that the observations in the previous paragraph hinged on the assumption thatU ∩U^⊥ = {0}, which itself holds providedU contains no non- zero “self-orthogonal vectors” u such that uu is zero. If F is the field of real numbers, then of course there are no non-zero self-orthogonal vectors, sinceuu is the sum of the squares of the entries ofu. However, for other fields, there may very well be non-zero self-orthogonal vectors. As an example, ifF =Z2, then any vectoruwith an even number of 1-entries is self orthogonal.

So we see that while much of the theory of vector spaces and matrices carries over without change from familiar ground fields, like the real numbers, to arbitrary ground fieldsF, not everything does. In particular, the usual decomposition of a vector space into a subspace and its orthogonal complement breaks down, as does any other procedure that relies on properties specific to “inner product spaces.”

(20)

For the following three exercises, as above,Ais an arbitrarym×nmatrix over a fieldF, andXA=B, whereXis an invertiblem×mmatrix, andBis in reduced row echelon form.

EXERCISE14.14. Show that the column null space ofAis the same as the column null space ofB.

EXERCISE 14.15. Show how to compute a basis for the column null space ofA usingO(r(n−r)) operations inF, givenAandB.

EXERCISE 14.16. Show that the matrix B is uniquely determined by A; more precisely, show that ifX⁰A=B⁰, whereX⁰is an invertiblem×mmatrix, andB⁰is in reduced row echelon form, thenB⁰=B.

In the following two exercises, the theory of determinants could be used; however, they can all be solved directly, without too much difficulty, using just the ideas developed so far in the text.

EXERCISE14.17. Letpbe a prime. A matrixA∈Z^m×mis calledinvertible modulopif there exists a matrixB ∈Z^m×msuch thatAB ≡BA≡I(modp), whereI is them×minteger identity matrix. Here, two matrices are considered congruent with respect to a given modulus if their corresponding entries are congruent. Show thatAis invertible modulopif and only ifAis invertible overQ, and the entries ofA⁻¹lie inQ^(p)(see Example 7.26).

EXERCISE 14.18. You are given a matrixA ∈ Z^m×m and a primep such thatA is invertible modulo p (see previous exercise). Suppose that you are also given w∈Z^1×m.

(a) Show how to efficiently compute a vector v ∈ Z^1×m such that vA ≡ w (modp), and thatvis uniquely determined modulop.

(b) Given a vector v as in part (a), along with an integer e ≥ 1, show how to efficiently compute ˆv ∈ Z^1×m such that ˆvA ≡ w (mod p^e), and that ˆv is uniquely determined modulo p^e. Hint: mimic the “lifting” procedure discussed in §12.5.2.

(c) Using parts (a) and (b), design and analyze an efficient algorithm that takes the matrix A and the primep as input, together with a bound H on the absolute value of the numerator and denominator of the entries of the vector v⁰ that is the unique (rational) solution to the equation v⁰A = w. Your algorithm should run in time polynomial in the length ofH, the length of p, and the sum of the lengths of the entries ofAandw. Hint: use rational reconstruction, but be sure to fully justify its application.