getdoc47c5. 210KB Jun 04 2011 12:04:20 AM

(1)

E l e c t ro n ic

J o u r n

a l o

f P

r o b

a b i l i t y

Vol. 8 (2003) Paper no. 7, pages 1–29. Journal URL

http://www.math.washington.edu/˜ejpecp/ Paper URL

http://www.math.washington.edu/˜ejpecp/EjpVol8/paper7.abs.html

THE NORM OF THE PRODUCT OF A LARGE MATRIX AND A RANDOM VECTOR

Albrecht B¨ottcher

Fakult¨at f¨ur Mathematik, TU Chemnitz, 09107 Chemnitz, Germany

[email protected]

Sergei Grudsky 1

Departamento de Matemáticas, CINVESTAV del I.P.N., Apartado Postal 14-740, 07000 México, D.F., México

[email protected], [email protected]

Abstract Given a real or complexn_×nmatrixAn, we compute the expected value and the variance of the random variable _kAnxk2/kAnk2, where x is uniformly distributed on the unit sphere ofRnorCn. The result is applied to several classes of structured matrices. It is in particular shown that if An is a Toeplitz matrixTn(b), then for largen the values of _kAnxk/kAnk cluster fairly sharply aroundkbk2/kbk∞ ifb is bounded and around zero

in case bis unbounded.

Keywords condition number, matrix norm, random vector, Toeplitz matrix

AMS subject classification (2000) Primary 60H30, Secondary 15A60, 47B35, 65F35 Submitted to EJP on September 9, 2002. Final version accepted on May 14, 2003.

1

(2)

1

_Introduction

Let _{k · k} be the Euclidean norm in Rn_{. For a real} _n_×_n _matrix _A

n, the spectral norm kAnk is defined by

kA_nk= max

kxk=1kAnxk= max0<kxk≤1

kAnxk kx_k .

Let s1 ≤s2 ≤. . .≤sn be the singular values ofAn, that is, the eigenvalues of (A⊤A)1/2. The set_{kAnxk/kAnk:kxk= 1} coincides with the segment [s1/sn,1]. We show that for a randomly chosen unit vector x the value of_kAnxk2/kAnk2 typically lies near

1

s2 n

s2₁+. . .+s2_n

n . (1)

Notice that sn = kAnk and that s21+. . .+s2n = kAnk2_F, where kAnkF is the Frobenius (or Hilbert-Schmidt) norm. Thus, if _kA_nk= 1, then for a typical unit vector x the value of _kAnxk2 is close to kAnk2F/n. The purpose of this paper is to use this observation in order to examine the most probable values of_kAnxk/(kAnk kxk) for several classes of large structured matrices An.

Our interest in the problem considered here arose from a talk by Siegfried Rump at a conference in Marrakesh in 2001. Let Mn(R) denote the real n×n matrices and let Circn(R) stand for the circulant matrices in Mn(R). For an invertible matrix An ∈ Circn(R), define the unstructured condition number κ(An, x) of An at a vector x ∈ Rn as limε→0supkδxk/(εkxk), the supremum over all δx such that (An+δAn)(x+δx) =

Anx for some δAn ∈ Mn(R) with kδAnk ≤ εkAnk, and define the structured condition number κcirc(An, x) as limε→0supkδxk/(εkxk), this time the supremum over all δx such that (An+δAn)(x+δx) =Anx for some δAn ∈Circn(R) with kδAnk ≤ εkAnk. A well known result by Skeel says that κ(An, x) =kAnk kAn−1k (for every An∈Mn(R)), and in his talk Rump proved that

κcirc(An, x) = k

A_{nk k}A−1 n xk kx_k

(see also [9], [14]). Thus,

κcirc₍_A n, x)

κ(An, x)

= kA− 1 n xk kA−n1k kxk

, (2)

which naturally leads to the question on the value taken by (2) at a typicalx.

2

_{General Matrices}

Let Bn = {x ∈ Rn : kxk ≤ 1} and Sn−1 = {x ∈ Rn : kxk = 1}. For a given matrix

An∈Mn(R), we consider the random variable

Xn(x) = k

(3)

where xis uniformly distributed on Sn−1.

As the following lemma shows, there is no difference between taking x uniformly on a sphere or in a ball.

Lemma 2.1 For every natural number k,

1

The following result is undoubtedly known. As we have not found an explicit reference, we cite it with a full proof.

(4)

notice that in (5) we first made the substitution Vnx =y and then changed the notation

y back tox. By symmetry, the integrals 1

are independent ofj and hence they are all equal to 1/n. This proves (3). In analogy to (6),

(5)

Since σ2_X2

n can be written in the symmetric forms

σ2X_n2 = 2

Obvious modifications of the proof of Theorem 2.2 show that Theorem 2.2 remains true for complex matrices on Cn _{with the}_ℓ2 _norm.

Example 2.3 Let

For EX_n2 = 1/n_≤ε/2 we therefore obtain from Chebyshev’s inequality that

(6)

with probability at least 90% for n _≥ 18 and with probability at least 99% for n _≥ 57, and the inequality

(x1+. . .+xn)2 ≤

n

100(x 2

1+. . .+x2n),

is true with probability at least 90% whenevern_≥895 and with probability at least 99% provided n_≥2829. We will return to the present example in Example 7.5.

The following lemma will prove useful when studying concrete classes of matrices. We denote by _{k · ktr} the trace norm, that is, the sum of the singular values.

Lemma 2.4 Let _{A_n}∞

n=1 be a sequence of matrices An ∈ Mn(K), where K = R or K=C. If

kA_nktr

n =O(1) and kAnk → ∞,

then EX2

n→0 as n→ ∞.

Proof. Let s1(An) ≤ s2(An) ≤. . . ≤ sn(An) be the singular values of An and note that

sn(An) =kAnk. By assumption, there is a a finite constantM such that 1

n

X

j=1

sj(An)≤M

for alln. Fixε_∈(0,1), for instance,ε= 1/2. LetNndenote the number of alljfor which

sj(An)≥MkAnk1−ε. Then

M _≥ 1 n

n

X

j=1

sj(An)≥ 1

nNnMkAnk

1−ε_,

whenceNn≤n/kAnk1−ε and thus, by Theorem 2.2,

EX_n2 = 1

ns2 n(An)

n

X

j=1

s2_j(An)≤

(n₋Nn)M2kAnk2−2ε

n_kA_nk2 +

N_nkA_nk2 n_kA_nk2

≤ M

2 kA_nk2ε +

1

kA_nk1−ε =o(1) because _kAnk → ∞.

We remark that if EX_n2 _→ 0, then P(Xn ≥ ε) = O(1/n) for each ε > 0: we have

EX_n2 _≤ε2/2 for alln_≥n0 and hence

P(Xn≥ε) =P(Xn2 ≥ε2)≤P

µ

X_n2₋EX_n2 _≥ ε

2 2

¶

=P

µ

|X_n2₋EX_{n| ≥}2 ε

2 2

¶

≤ 4σ 2_X2

n

(7)

3

_{Toeplitz Matrices with Bounded Symbols}

We need one more simple auxiliary result.

Lemma 3.1 Let EX_n2 =µ2_n and suppose µn →µ as n→ ∞. If ε >0 and |µn−µ|< ε,

then

P(_|Xn−µ| ≥ε)≤

σ2X_n2 µ2

n(ε− |µn−µ|)2

.

Proof. We have

P(_|Xn−µ| ≥ε)≤P

³

|Xn−µn| ≥ε− |µn−µ|

´

≤P³_|Xn−µn|(Xn+µn)≥µn(ε− |µn−µ|)

´

=P³_|X_n2₋µ2_{n| ≥}µn(ε− |µn−µ|)

´

,

and the assertion is now immediate from Chebyshev’s inequality.

Now let An be a Toeplitz matrix, that is,An=Tn(b) := (bj−k)nj,k=1, where

bk=

Z 2π

0

b(eiθ)e−ijθ dθ

2π (j∈Z). (12)

Clearly, (12) makes sense for everyb_∈L1 _{on the complex unit circle} _{T. Throughout this} section we assume that b is a function in L∞. The Avram-Parter theorem says that in this case

lim n→∞

sk

1 +. . .+skn

n =kbk

k k:=

Z 2π

0 |

b(eiθ)_|kdθ

2π (13)

for every natural numberk(see [1], [2], [4], [11]). It is also well known thatsn=kTn(b)k → kb_k_∞asn_{→ ∞}(see [2] or [4], for example). In what follows we always assume thatbdoes not vanish identically. In Theorems 3.2 to 3.5, the constants hidden in theO’s depend of course on εand δ, respectively.

Theorem 3.2 Let b _∈L∞ and suppose _|b_| is not constant almost everywhere. Then for each ε >0, there is ann0 =n0(ε) such that

Pµ ¯¯¯_¯ kTn(b)xk

kTn(b)k kxk− kb_k2

kb_k_∞

¯ ¯ ¯ ¯≥ε

¶

≤ _n_{+ 2}3 _ε12

kb_k4₄_{− k}b_k2₂

kb_k2₂_kb_k2_∞ (14)

for all n_≥n0. If, in addition, b is a rational function, then for each δ >0,

Pµ ¯¯¯_¯ kTn(b)xk

kTn(b)_{k k}x_k −

kb_k2

kb_k_∞

¯ ¯ ¯ ¯≥

1

n1/2−δ

¶

=O

µ

1

n2δ

¶

(8)

Proof. Since_|b_|is not constant, it follows that_kb_k4 >_kb_k2. Put

Thus, Lemma 3.1 shows that

P(_|Xn−µ| ≥ε)≤

for all sufficiently large n, which is (14). Ifb is a rational function, we even have

sk₁+. . .+sk_n for every natural numberk and

sn=kbk∞+O

Theorem 3.3 Let b_∈L∞ and suppose _|b_| is constant almost everywhere. Then

P Lemma 3.1 therefore gives

(9)

If bis rational, we have (16) and (17). Thus,

Consequently, by Lemma 3.1,

P

We now consider the case whereAn is the inverse of a Toeplitz matrix. Supposeb is a continuous function onTandbhas no zeros onT. Let windbdenote the winding number of babout the origin.

If windb= 0, then Tn(b) is invertible for all sufficiently largenand

Since bhas no zeros onT, the Avram-Parter formula (13) also holds for negative integers

(10)

In the case where bis rational, one can sharpen (18) and (13) to therefore consider the Moore-Penrose inverse T+

n(b), which coincides with Tn−1(b) in the

Proof. The so-called splitting phenomenon, discovered by Roch and Silbermann [13] (see also [2]), tells us that if _|windb_|=k_≥1, then ksingular values ofTn(b) converge to zero with exponential speed,

sℓ≤Ce−γn (γ >0) for ℓ≤k, while the remaining singular values stay away from zero,

(11)

for all sufficiently large n. Also by Theorem 2.2,

for all sufficiently large n, and thus,

P(Xn≥ε)≤P

Similarly, for large n,

P

Now supposebis a trigonometric polynomial. ThenTn(b) is a banded matrix for all suffi-ciently large n. For these n, we changeTn(b) to a circulant matrix by adding appropriate entries in the upper-right and lower-left corner blocks. For example, if

(12)

We have

Cn(b) =U_n∗diag (b(1), b(ωn), . . . , b(ω_nn−1))Un (19) where Un is a unitary matrix and ωn = e2πi/n. Thus, the singular values of Cn(b) are |b(ωn)j _|(j = 0, . . . , n₋1). The only trigonometric polynomialsb of constant modulus are

b(t) =αtk (t_∈T) withα_∈C, and in this case_kCn(b)xk=|α| kxkfor all x.

Theorem 4.1 Assume that _|b_| is not constant. Then for each ε > 0 there exists an

n0 =n0(ε) such that

Pµ ¯¯¯_¯ kCn(b)xk

kCn(b)k kxk− kb_k2 kb_k_∞

¯ ¯ ¯ ¯≥ε

¶

≤ _n_{+ 2}3 _ε12

kb_k4₄_{− k}b_k2₂

kb_k2₂_kb_k2_∞

for all n_≥n0.

Proof. The proof is analogous to the proof of (14). Note that now (13) amounts to the fact that the integral sum

sk

1+. . .+skn

n =

n−1

X

j=0

|b(e2πij/n)_|k1

n

converges to the Riemann integral

Z 1

0 |

b(e2πiθ)_|kdθ=

Z 2π

0 |

b(eiθ)_|kdθ

2π =:kbk

k k.

Furthermore, it is obvious that sn= max|b(ωn)j | → kbk∞.

Ifbhas no zeros on T, then (19) shows that

C_n−1(b) =U_n∗diag (b−1(1), b−1(ωn), . . . , b−1(ωnn−1))Un and hence the argument of the proof of Theorem 4.1 delivers

Pµ ¯¯¯_¯ kC−

1 n (b)xk kCn−1(b)k kxk −

kb−1_k2

kb−1_k

∞

¯ ¯ ¯ ¯≥ε

¶

≤ _n_{+ 2}3 _ε1₂ kb

−1

k44− kb−1k22

kb−1_k2₂_kb−1_k2_∞ (20)

for all sufficiently large n.

Example 4.2 Putb(t) = 2 +α+t+t−1, whereα >0 is small. Thus,

C5(b) =

     

2 +α 1 0 0 1

1 2 +α 1 0 0

0 1 2 +α 1 0

0 0 1 2 +α 1

1 0 0 1 2 +α

     

.

If n is large, then _kCn(b)_{k ≈} 4 and _kC_n−1(b)_{k ≈} 1/α. Consequently, for the condition numbers defined in the introduction we have

κ(Cn(b), x) =kCn(b)k kCn−1(b)k ≈ 4

(13)

and

κcirc(Cn(b), x) = k

C_n−1(b)x_k

kCn−1(b)k kxk

κ(Cn(b), x)≈ 4

α

kC_n−1(b)x_k

kCn−1(b)k kxk

.

From (20) we therefore obtain that if nis sufficiently large, then with probability near 1,

κcirc(Cn(b), x)≈ 4

α

kb−1_k2

kb−1_k

∞

= 4

α

α1/4(2 +α)1/2 (4 +α)3/4 ≈

α1/4

2 4

α.

For α= 0.01 this gives

κ(Cn(b), x)≈400, κcirc(Cn(b), x)≈63 with probability near 1, and forα = 0.0001 we get

κ(Cn(b), x)≈40000, κcirc(Cn(b), x)≈2000 with probability near 1.

5

_{Hankel Matrices} We begin with a general result.

Theorem 5.1 Let A = (ajk)∞j,k=1 be an infinite matrix and put An = (ajk)nj,k=1. If A

induces a compact operator on ℓ2, then EX_n2_→0 as n_{→ ∞}. Proof. We denote by_F∞

j andFjnthe operators of rank at mostjonℓ2andCn, respectively. The approximation numbers σj of A and An are defined by

σj(A) = dist (A,Fj∞) = inf{kA−Fjk:Fj ∈ Fj∞}, (j= 0,1,2, . . .),

σj(An) = dist (An,Fjn) = inf{kA−Fjk:Fj ∈ Fjn}, (j= 0,1, . . . , n−1). Note that the approximation numbers of An are just the singular values in reverse order,

sn−j(An) =σj(An) for j= 0, . . . , n−1. LetPn stand for the orthogonal projection onto the first ncoordinates. If Fj ∈ Fj∞, then PnFjPn may be identified with a matrix inFjn. Furthermore, a matrix Fj ∈ Fjn may be thought of as a matrix of the formPnFjPn with

Fj ∈ Fj∞. Thus, Fjn=PnFj∞Pn and it follows that

sn−j(An) =σj(An) = inf{kPnAPn−Fjk:Fj ∈ Fjn} = inf_{kPnAPn−PnFjPnk:Fj ∈ Fj∞}

≤inf_{kA₋Fjk:Fj ∈ Fj∞}=σj(A). From Theorem 2.2 we therefore obtain

EX_n2 = 1

n

X

k=1

s2_k(An) = 1

n

n−1

X

j=0

s2_n₋_j(An)≤ 1

n

n−1

X

j=0

(14)

But if A is compact, thenσj(A)→0 as j → ∞. This implies that the right-hand side of (21) goes to zero asn_{→ ∞}.

An interesting concrete situation is the case where A = H(b) is the Hankel matrix (bj+k−1)∞j,k=1 generated by the Fourier coefficients of a functionb∈L1. Ifbis continuous, then the Hankel matrix H(b) induces a compact operator and hence, by Theorem 5.1,

EX2

n→0. The following result shows that, surprisingly,EXn2→0 for all Hankel matrices with L1 symbols (notice that such matrices need not even generate bounded operators). Theorem 5.2 Let b _∈ L1 and let An be the n×n principal section of H(b). Then

EX2 n→0.

Proof. As shown by Fasino and Tilli [6], [15], lim

n→∞

F(s1) +. . .+F(sn)

n =F(0)

for every uniformly continuous and bounded function F on R. Suppose first that H(b) induces a bounded operator. Then _kAnk ≤ kH(b)k =: d < ∞, which implies that all singular values ofAn lie in the segment [0, d]. Thus, lettingF be a smooth and bounded function such that F(x) = x2 _{on [0}_{, d}_{], we deduce that} P_s2

j/n → 0. Since b is not identically zero, there is anN such that_kANk>0. It follows that 0<kANk ≤ kAnk=sn for all n_≥n. Thus, Ps_j2/(ns2_n)_→0, and Theorem 2.2 gives the assertion.

Now suppose thatH(b) is not bounded. We claim that then_kAnk → ∞. Indeed, the sequence _{kA_nk} is monotonically increasing: _kA_{nk ≤ k}A_n+1k for alln. If there exists a finite constantM such that_kA_{nk ≤}M for alln, then_{Anx} is a convergent sequence for each x_∈ℓ2. The Banach-Steinhaus theorem (= uniform boundedness principle) therefore implies that the operator Adefined by Ax:= limAnx is bounded onℓ2. But A is clearly given by the matrixH(b). This contradiction proves that_kA_{nk → ∞}. Finally, Fasino and Tilli [6], [15] proved that always

1

nkAnktr≤2kbk1.

Lemma 2.4 now shows thatEX_n2 _→0.

6

_{Toeplitz Matrices with Unbounded Symbols}

Following Tyrtyshnikov and Zamarashkin [18], we consider Toeplitz matrices generated by so-called Radon measures. Thus, given a function β : [₋π, π]_→C of bounded variation, we define

(dβ)k= 1 2π

Z π

−π

e−ikθdβ(θ), (22)

the integral understood in the Riemann-Stieltjes sense, and we put

(15)

Ifβ is absolutely continuous, thenβ′_∈_L1_[₋_{π, π}_{] and (}_dβ₎

kis nothing but thekth Fourier coefficient ofβ′, defined in accordance with (12). Consequently, in this case Tn(dβ) is just what we denoted by Tn(β′) in Section 3.

For general β we have β = βa+βj+βs where βa is absolutely continuous with βa′ ∈

L1[₋π, π],βj is the “jump part”, that is, a function of the form

βj(θ) =

X

θℓ<θ

hℓ,

X

ℓ

|h_ℓ|<_∞,

with an at most countable set_{θ1, θ2, . . .} ⊂[−π, π), andβs is the “singular part”, that is, a continuous function of bounded variation whose derivative vanishes almost everywhere. This decomposition is unique up to constant additive terms. After partial integration (see, e.g., [7, No. 577]), formula (22) can be written more explicitly as

(dβ)k = 1 2π

Z π

−π

e−ikθβ_a′(θ)dθ+ 1 2π

X

ℓ

hℓe−iθℓk

+(−1) k

2π

³

βs(π)₋βs(₋π)´+ ik 2π

Z π

−π

e−ikθβs(θ)dθ.

In particular, if β(θ) = 0 forθ _∈[₋π,0] and β(θ) = 2π forθ_∈ (0, π], then (dβ)k = 1 for all k, that is,Tn(dβ) is the matrix (10).

Theorem 6.1 Let β =βa+βj+βs be a nonconstant function of bounded variation and

putAn=Tn(dβ). Then EXn2 converges to a limit as n→ ∞. This limit is positive if and

only if β′

a ∈L∞[−π, π]and βj =βs= 0.

Proof. If β =βa with βa′ ∈L∞[−π, π], then EXn2 converges to kβak′ 2/kβak′ ∞ 6= 0 due to

Theorems 3.2 and 3.3.

So assumeβ=βa+βj+βs. Writeβ=β1−β2+i(β3−β4) with nonnegative functions

βk of bounded variation. We have 1

nkTn(dβ)ktr≤

1

n

4

X

k=1

kTn(dβk)ktr,

The singular values of the positively semi-definite matrices Tn(dβk) coincide with the eigenvalues. Hence

1

nkTn(βk)ktr=

1

ntrTn(βk) = (dβk)0 =

1 2π

Z π

−π

dβk≤ 1

2πVarβk<∞

(this argument is standard; see, e.g., [18]). Consequently, there is a finite constantM such that

1

nkTn(dβ)ktr≤M

(16)

with β′

a ∈ L∞[−π, π]. By virtue of Lemma 2.4, this implies that EXn2 → 0 whenever

β′_a_∈/ L∞[₋π, π] orβj6= 0 or βs 6= 0.

Thus, suppose there is a finite constant C such that _kTn(dβ)k ≤ C for all n. Let

ej ∈Cn be thejth vector of the standard basis. Then

kTn(dβ)e1k22 =|(dβ)0|2+. . .+|(dβ)n−1|2 ≤C2,

kTn(dβ)enk22=|(dβ)0|2+. . .+|(dβ)−(n−1)|2 ≤C2

for all n, which tells us that there is a function b _∈ L2[₋π, π] such that (dβ)k = bk for all k. Since the decomposition of β into the absolutely continuous part, the jump part, and the singular part is unique (up to additive constants), it follows that β=βa+βj+βs with β_a′ = b and βj = βs = 0. We are left to show that b is in L∞[−π, π]. Using the Banach-Steinhaus theorem as in the proof of Theorem 5.2 we arrive at the conclusion that the Toeplitz matrix T(b) := (bj−k)∞j,k=1 induces a bounded operator onℓ2. By a classical theorem of Toeplitz [16] (full proofs are also in [3] and [8]), this happens if and only if b

is in L∞_[₋_{π, π}_].

Theorem 6.1 reveals in particular that EX2

n → 0 if An = Tn(b) with b ∈ L1 \L∞. The following theorem concerns a class of Toeplitz matrices with increasing entries. The notation cj ≃dj means thatcj/dj remains bounded and bounded away from zero. Theorem 6.2 Let An = (bj−k)n_j,k=1 where |bj| ≃ eγj as j → +∞ and |b−j| ≃ eδj as

j_→+_∞. If one of the numbers γ andδ is positive, then EX_n2 =O(1/n) as n_{→ ∞}. Proof. For the sake of definiteness, supposeγ >0. We have

kA_nk2_F = n−1

X

j=0

(n₋j)_|bj|2+ n−1

X

j=1

(n₋j)_|b₋j|2

≤ C1 n−1

X

j=0

(n₋j)e2γj +C1 n−1

X

j=1

(n₋j)e2δj

with some finite constant C1. In the cases δ >0 and δ≤0, this gives kA_nk2_F_≤C2

³

e2γn+e2δn´ and _kA_nk2_F_≤C2e2γn with some finite constant C2, respectively; note that, for example,

n_X−1

j=0

(n₋j)e2γj = e 2γn

(e2γ₋₁₎2 +O(n). On the other hand, considering_kAne1k22 and kAnenk22, we see that

kA_nk2_≥ 1

2 n−1

X

j=0 |bj|2+

1 2

n−1

X

j=1

(17)

which shows that there is a finite constant C3 >0 such that kAnk2 ≥C3

³

e2γn+e2δn´ and _kAnk2≥C3e2γn forδ >0 andδ _≤0, respectively. Thus, in either case,

EX_n2 = kAnk 2 F

n_kA_nk2 =O

µ

1

n

¶

.

Example 6.3 Let

An=

     

a b̺ b̺2 _{. . .} _b̺n−1

cσ a b̺ . . . b̺n−2

cσ2 cσ a . . . b̺n−3 . . . . cσn−1 _cσn−2 _cσn−3 _{. . .} _a

     .

Similar matrices are studied in [17]. In the case a=b=c and σ =̺, such matrices are called Kac-Murdock-Szeg¨o matrices [10]. Suppose that a, b, c are nonzero. If_|σ_|<1 and |̺_| <1, then EX_n2 converges to a nonzero limit by Theorem 2.2 and (13). If _|σ_|> 1 or |̺_| >1, we can invoke Theorem 6.2 to deduce that EX_n2 _→ 0. Finally, in the two cases where _|σ_{| ≤}1 =_|̺_|or_|̺_{| ≤}1 =_|σ_|, Theorem 6.1 implies thatEX2

n→0.

7

_{Appendix: Distribution Functions}

The referee suggested that it would be interesting to compute the distribution of X_n2 in some cases and noted that this can probably be done easily for smallnand for the matrix of Example 2.3. The purpose of this section is to address this problem. It will turn out that the referee is right in all respects.

Let An ∈ Mn(R) and let 0 ≤ s1 ≤ . . . ≤ sn be the singular values of An. Suppose

sn > 0. The random variable Xn2 = kAnxk2/kAnk2 assumes its values in [0,1]. With notation as in the proof of Theorem 2.2,

Eξ :=

½

x_∈Sn−1: k

Anxk2 kAnk2

< ξ

¾

=

½

x_∈Sn−1: k

DnVnxk2

s2 n

< ξ

¾

.

PutGξ ={x∈Sn−1 :kDnxk2/s2n< ξ}. Clearly, Gξ=Vn(Eξ). SinceVn is an orthogonal matrix, it leaves the surface measure on Sn−1 invariant. It follows that |Gξ| =|Vn(Eξ)| and hence

Fn(ξ) :=P(Xn2< ξ) =P

µ

kDnxk2

s2 n

< ξ

¶

=P

µ

s2

1x21+. . .+s2nx2n

s2 n

< ξ

¶

. (24) This reveals first of all that the distribution function Fn(ξ) depends only on the singular values of An.

A real-valued random variableX is said to be B(α, β) distributed on (a, b) if

P(c_≤X < d) =

Z d

c

(18)

where the density function f(ξ) is zero on (_−∞, a] and [b,_∞) and equals (b₋a)1−α−β

B(α, β) (ξ−a)

α−1₍_b₋_ξ₎β−1

on (a, b). HereB(α, β) = Γ(α)Γ(β)/Γ(α+β) is the common beta function and it is assumed that α > 0 and β > 0. The emergence of the beta distribution, of the χ2 distribution, of elliptic integrals and Bessel functions in connection with uniform distribution on the unit sphere is no surprise and can be found throughout the literature. Thus, the following results are not at all new. However, they tell a nice story and uncover the astonishing simplicity of Theorem 2.2.

We first consider 2_×2 matrices, that is, we let n= 2. From (24) we infer that

F2(ξ) =P

µ

s2₁ s2 2

x2₁+x2₂ < ξ

¶

. (25)

The constellation s1 =s2 is uninteresting, because F2(ξ) = 0 for ξ <1 and F2(ξ) = 1 for

ξ _≥1 in this case.

Theorem 7.1 If s1< s2, then the random variableX22 is subject to the B

¡₁

2,12

¢

distribu-tion on (s2

1/s22,1).

Proof. Putτ =s1/s2. By (25),F2(ξ) is _2π1 times the length of the piece of the unit circle

x2₁+x2₂ = 1 that is contained in the interior of the ellipse τ2x2₁ +x2₂ = ξ. This gives

F2(ξ) = 0 for ξ ≤τ2 and F2(ξ) = 1 for ξ ≥1. Thus, let ξ ∈(τ2,1). Then the circle and the ellipse intersect at the four points

Ã

±

r

1₋ξ

1₋τ2,±

r

ξ₋τ2 1₋τ2

!

,

and consequently,

F2(ξ) = 2

πarctan

s

ξ₋τ2 1₋ξ ,

which implies that F′

2(ξ) equals 1

π(ξ−τ

2₎−1/2₍₁

−ξ)−1/2 = 1

B(1/2,1/2)(ξ−τ

2₎−1/2₍₁

−ξ)−1/2 and proves that X₂2 has theB¡1₂,1₂¢ distribution on (τ2,1).

In the general case, things are more involved. An idea of the variety of possible distribution functions is provided by the class of matrices whose singular values satisfy

0 =s1 =. . .=sn−m < sn−m+1≤. . .≤sn (26) with smallm. Notice thatm is just the rank of the matrix. We put

µn−m+1 =

sn

sn−m+1

, µn−m+2 =

sn

sn−m+2

, . . . , µn=

sn

(19)

Our problem is to find

Fn(ξ) =P

µ

x2_n₋_m+1 µ2

n−m+1

+. . .+x 2 n

µ2 n

< ξ

¶

; (27)

the dependence of Fn on m and µn−m+1, . . . , µn will be suppressed.

Example 7.2 In order to illustrate what will follow by a transparent special case, we take n = 3 and suppose that the singular values of A3 satisfy 0 = s1 < s2 < s3. We put µ=s3/s2. Clearly, (27) becomes F3(ξ) =P

³_x2 2

µ2 +x2₃ < ξ

´

. We rename x1, x2, x3 to

x, y, z. The preceding equality tells us that F3(ξ) is _4π1 times the area of the piece Σ ofe the sphere x2₊_y2₊_z2 _{= 1 that is cut out by the elliptic cylinder} _y2_/µ2₊_z2 _{< ξ}_{. Let} us assume that ξ _∈(0,1/µ2). Then the ellipse y2/µ2+z2 < ξ is completely contained in the disky2+z2 <1. By symmetry, it suffices to consider the part Σ of Σ that lies in thee octant x_≥0, y _≥0, z_≥0. We have

F3(ξ) = 8 4π

Z

Σ

dσ,

and it easily verified (see the proof of Theorem 7.3) that 8

4π

Z

Σ

dσ= 8 4π/3

Z

co(0,Σ)

dx dy dz,

where co(0,Σ) is the cone _∪s_∈Σ[0, s] (we prefer volume integrals to surface integrals). A parametrization of Σ is

y=µrcosϕ z=rsinϕ x=

q

1₋r2₍_µ2_cos2_ϕ_{+ sin}2_ϕ₎_,

where r _∈ [0,√ξ) and ϕ_∈[0, π/2]. Consequently, a parametrization of co(0,Σ) is given by

y=tµrcosϕ z=trsinϕ x=t

q

1₋r2₍_µ2_cos2_ϕ_{+ sin}2_ϕ₎

with t _∈ [0,1] and r and ϕ as before. We set v(ϕ) = µ2cos2ϕ+ sin2ϕ and denote the Jacobian ∂(y, z, x)/∂(t, r, ϕ) byJ. By what was said above,

F3(ξ) = 6

π

Z

co(0,Σ)

dx dy dz= 6

π

Z π/2

0

Z √

ξ

0

Z 1

0 |

(20)

Writing downJ and subtracting r/ttimes the second column from the first we get

(21)

We now return to the situation given by (26). Let Q= [0, π/2]. Forϕ1, . . . , ϕk−1 in

Q we introduce the spherical coordinatesω₁(k), . . . , ω(k)_k by

ω₁(k)= cosϕ1

ω₂(k)= sinϕ1cosϕ2

ω₃(k)= sinϕ1sinϕ2cosϕ3

. . .

ω_k(k)₋₁ = sinϕ1sinϕ2. . .sinϕk−2cosϕk−2

ω_k(k)= sinϕ1sinϕ2. . .sinϕk−2sinϕk−2. Notice that

∂(rω(k)₁ , . . . , rω(k)_k )

∂(r, ϕ1, . . . , ϕk−1)

=rk−1sink−2ϕ1sink−3ϕ2. . .sinϕk−2, (29)

Z

Qk₋1

sink−2ϕ1sink−3ϕ2. . .sinϕk−2dϕ1. . . dϕk−1= 1 2k−1

πk/2

Γ(k/2). (30) We define v=v(ϕ1, . . . , ϕm−1) by

v=µ2_n₋_m+1[ω₁(m)]2+µ2_n₋_m+2[ω(m)₂ ]2+. . .+µ_n2[ω_n(m)₋_m]2.

Theorem 7.3 Let n _≥ 3 and suppose the singular values of An satisfy (26). Then for

ξ _∈(0,1/µ2

n−m+1), the density function of Xn2 is

fn(ξ) =cnξ(m−2)/2

Z

Qm

−1

(1₋ξv(ϕ))(n−m−2)/2sinm−2ϕ1sinm−3ϕ2. . .sinϕm−2dϕ

with

cn= 2m−1

πm/2

Γ³n 2

´

Γ

µ

n₋m

2

¶µn−m+1. . . µn.

Proof. We proceed as in Example 7.2. Let Σ denote the set of all points (x1, . . . , xn) for which x2₁+. . .+x_n2 = 1,x1≥0, . . . , xn≥0, and

x2 n−m+1

µ2 n−m+1

+. . .+x 2 n

µ2 n

< ξ. (31)

We have

Fn(ξ) = 2n |Sn−1|

Z

Σ

dσ.

We prefer to switch from the surface integral to a volume integral. Let co(0,Σ) denote the cone formed by all segments [0, s] with s_∈Σ. Because_|Sn−1|=n|Bn|, we get

1 |Sn−1|

Z

Σ

dσ= 1

n_|B_n|

Z

Σ

dσ= 1 |B_n|

Z 1

0

Z

Σ

tn−1dσ dt= 1 |B_n|

Z

co(0,Σ)

(22)

Since ξµ2

n−m+j <1 forj= 1, . . . , m, the ellipsoid (31) is completely contained in the ball

x2_n₋_m+1+. . .+x2_n<1.

We start with the parametrization

xn−m+1 = µn−m+1r ω1(m)(ϕ1, . . . , ϕm−1)

xn−m+2 = µn−m+2r ω2(m)(ϕ1, . . . , ϕm−1)

. . .

xn = µnr ω(m)m (ϕ1, . . . , ϕm−1),

where r_∈[0,√ξ) and (ϕ1, . . . , ϕm−1)∈Qm−1. By the definition ofv, x2₁+. . .+x2_n₋_m = 1₋x2_n₋_m+1₋. . .₋x2_n= 1₋r2v.

Hence, after letting (θ1, . . . , θn−m−1)∈Qn−m−1 and

x1 =

p

1₋r2_v₍_ϕ

1, . . . , ϕm−1)ω1(n−m)(θ1, . . . , θn−m−1)

x2 =

p

1₋r2_v₍_ϕ

1, . . . , ϕm−1)ω2(n−m)(θ1, . . . , θn−m−1)

. . . xn−m =

p

1₋r2_v₍_ϕ

1, . . . , ϕm−1)ωn(n−−mm)(θ1, . . . , θn−m−1)

we have accomplished the parametrization of Σ. Finally, on multiplying the right-hand sides of the above expressions for x1, . . . , xn by t∈[0,1], we obtain a parametrization of co(0,Σ). The Jacobian

J = ∂(x1, . . . , xn)

∂(t, r, ϕ1, . . . , ϕm−1, θ1, . . . , θn−m−1)

can be evaluated as in Example 7.2: after subtracting r/ttimes the second column from the first and taking into account that

p

1₋r2_v₋_r ∂

∂r

p

1₋r2_v ₌ _√ 1 1₋r2_v

one arrives at a determinant that is the product of an m_×m determinant and an (n₋ m)_×(n₋m) determinant; these two determinants can in turn be computed using (29). What results is

|J_| = tn−1rm−1µn−m+1. . . µn(1−r2v)(n−m−2)/2 × sinm−2ϕ1sinm−3ϕ2. . .sinϕm−2

(23)

In summary, with cn as in the theorem. Consequently,

F_n′(ξ) =cnξ(m−2)/2

From Theorem 7.3 we therefore deduce that the density functionfn(ξ) is a constant times

ξ(m−2)/2(1₋ξ)(n−m−2)/2. The constant is

Under the hypothesis of Corollary 7.4, the density function ofX_n2 is

fn(ξ) =

It follows that the random variable nX2

(24)

on (0, n). Ifm remains fixed andngoes to infinity, then this has the limit 1

2m/2_Γ(_m/₂₎ξ

(m−2)/2_e−ξ/2_, _ξ

∈(0,_∞),

which is the density of the χ2

m distribution.

Example 7.5 Let us consider Example 2.3 again. Thus, suppose An is the matrix (10). The singular values ofAnare 0, . . . ,0, nand hence we can apply Corollary 7.4 withm= 1 to the situation at hand. It follows thatX2

n isB

¡₁

2,n−21

¢

distributed on the interval (0,1). If X has theB(α, β) distribution on (0,1), then

EX = α

α+β, σ

2_X₌ αβ

(α+β)2₍_α₊_β_{+ 1)}. This yields

EX_n2= 1

n, σ

2_X2 n=

2(n₋1)

n2₍_n_{+ 2)},

which is in perfect accordance with (11). In Example 2.3 we were able to conclude that

P(X_n2 _≥ε)_≤8/(n2ε2). Since we know the density, we can now write

P(X_n2_≥ε) = 1

B

µ

1 2,

n₋1 2

¶ Z 1

ε

ξ−1/2(1₋ξ)(n−3)/2dξ.

Once partially integrating and using Stirling’s formula we obtain

P(X_n2_≥ε) =

r

2

π

1

√_nε(1₋ε)(n−1)/2

µ

1 +O

µ

1

n

¶¶

,

the O depending on ε. Thus, for each ε _∈ (0,1), the probability P(X2

n ≥ ε) actually decays exponentially to zero as n_{→ ∞}.

Example 7.6 Orthogonal projections have just the singular value pattern of Corollary 7.4. This leads to some pretty nice conclusions.

Let E be an N-dimensional Euclidean space and let U be an m-dimensional linear subspace ofE. We denote by _PU the orthogonal projection ofE ontoU. Then fory_∈E, the element _PUy is the best approximation ofy inU and we have _ky_k2 =_kPUy_k2+_ky₋

PUy_k2. The singular values of _PU are N ₋m zeros and m units. Thus, Corollary 7.4 implies that if y is uniformly distributed on the unit sphere of E, then _kPUy_k2 _{has the}

B¡m₂,N−₂m¢ distribution on (0,1). In particular, if N is large, then _PUy lies with high probability close to the sphere of radiuspm_N and the squared distance_ky_−PUy_k2clusters sharply around 1₋ m_N.

(25)

an m-dimensional linear subspace of Mn(R). Examples include the Toeplitz matrices, Toep_n(R)

the Hankel matrices, Hankn(R) the tridiagonal matrices, Tridiag_n(R)

the tridiagonal Toeplitz matrices, TridiagToep_n(R) the symmetric matrices, Symm_n(R)

the lower-triangular matrices, Lowtriang_n(R) the matrices with zero main diagonal, Zerodiag_n(R) the matrices with zero trace, Zerotracen(R).

The dimensions of these linear spaces are

dim Toep_n(R) = 2n₋1, dim Hankn(R) = 2n₋1,

dim Tridiag_n(R) = 3n₋2, dim TridiagToep_n(R) = 3,

dim Symm_n(R) = n 2₊_n

2 , dim Lowtriangn(R) =

n2+n

2 , dim Zerodiag_n(R) =n2₋n, dim Zerotracen(R) =n2−1.

Supposenis large andYn∈Mn(R) is uniformly distributed on the unit sphere onMn(R), kY_nk2_F = 1. Let _PStructYn be the best approximation of Yn by a matrix in Structn(R). Notice that the determination of _PStructYn is a least squares problem that can be easily solved. For instance, _PToepYn is the Toeplitz matrix whose kth diagonal, k = −(n− 1), . . . , n₋1, is formed by the arithmetic mean of the numbers in the kth diagonal of

Yn. Recall that dim Structn(R) =m. From what was said in the preceding paragraph, we conclude that_kPStructY_nk2

FisB

³

m 2,n

2

−m 2

´

distributed on (0,1). For example,_kPToepY_nk2 has theB³2n₂−1,n2−2n+1₂ ´distribution on (0,1). The expected value of the variable_kYn− PToepYnk2is 1−_n2+_n12 and the variance does not exceed _n43. Hence, Chebyshev’s inequality gives

P

µ

1₋ 2

n+

1

n2 −

ε

n <kYn− PToepYnk

2 _<₁

−_n2 + 1

n2 +

ε n

¶

≥1₋ 4

nε2. (32) Consequently, _PToepYn is with high probability found near the sphere with the radius

q

2

n −n12 and kYn− PToepYnk2_F is tightly concentrated around 1− 2_n+_n12.

We arrive at the conclusion that nearly alln_×nmatrices of Frobenius norm 1 are at nearly the same distance to the set of all n_×n Toeplitz matrices!

This does not imply that the Toeplitz matrices are at the center of the universe. In fact, the conclusion is true for each of the classes Strucn(R) listed above. For instance, from Chebyshev’s inequality we obtain

P

µ

1 2 −

1

2n−ε <kYn− PSymmYnk

2 _< 1 2 −

1 2n+ε

¶

≥1₋ 1

(26)

and much sharper estimates. Namely, Lemma 2.2 of [5] in conjunction with Corollary 7.4 implies that ifX2

n has theB

¡_m

2,N−2m

¢

distribution on (0,1), then

P³X_n2 _≤σ m be small and choose τ such that

τ

theO depending onε, and hence (34) amounts to

P

which is worse than the Chebyshev estimate (32).

(27)

Proof. We havev(ϕ) =µ2_cos2_ϕ_{+ sin}2_ϕ _{and hence (28) yields} Combining (38) and Theorem 7.3 we arrive at (36). Formula 2.2.6.1 of [12] gives that (38) equals

This in conjunction with Theorem 7.3 proves (37).

Fory_∈(0,1), the complete elliptic integrals K(y) and E(y) are defined by

For smalln’s, Corollary 7.7 delivers the following densities on (0,1/µ2_):

f3(ξ) =

(28)

Formula 2.3.6.2 of [12] tells us that the last expression equals

µ

2πe

−ξ/2_{π e}−ξ(µ2

−1)/4_I 0

µ

ξ(µ2₋1) 4

¶

= µ 2 e

−ξ(µ2 +1)/4_I

0

µ

ξ(µ2₋1) 4

¶

,

where I0 is the modified Bessel function,

I0(y) := 1 +

∞

X

k=1

µ

1

k!

¶2_³

y

2

´2k

.

Conclusion. It is clear that estimates based on knowledge of the distribution function are in general better than estimates that are obtained from Chebyshev’s inequality. Examples 2.3 and 7.5 convincingly demonstrate the superiority of the distribution function over Chebyshev estimates. However, the message of this paper is that, for large n, the ratio kAnxk2/(kAnk2kxk2) clusters sharply around a certain number and that this number can be completely identified for important classes of structured matrices. The zoo of distribution functions we encountered above makes us appreciate the beauty of the simple and general Theorem 2.2 and the ease with which we were able to deduce the results of Sections 3 to 6 from this theorem.

Acknowledgment. We thank Florin Avram, Nick Trefethen, and Man-Chung Yeung for encouraging remarks on a baby version of this paper. We are also greatly indebted to David Aldous, Ted Cox, and the referee for giving our attempt at probability a warm reception.

References

[1] F. Avram: On bilinear forms in Gaussian random variables and Toeplitz matrices.

Probab. Theory Related Fields79(1988), 37–45.

[2] A. B¨ottcher and S. Grudsky: Toeplitz Matrices, Asymptotic Linear Algebra, and Functional Analysis. Hindustan Boook Agency, New Delhi 2000 and Birkh¨auser Ver-lag, Basel 2000.

[3] A. B¨ottcher and B. Silbermann: Analysis of Toeplitz Operators. Akademie-Verlag, Berlin 1989 and Springer-Verlag, Berlin 1990.

[4] A. B¨ottcher and B. Silbermann: Introduction to Large Truncated Toeplitz Matrices. Springer-Verlag, New York 1999.

[5] S. Dasgupta and A. Gupta: An elementary proof of a theorem of Johnson and Lin-denstrauss. Random Structures& Algorithms22(2003), 60–65.

[6] D. Fasino and P. Tilli: Spectral clustering properties of block multilevel Hankel ma-trices.Linear Algebra Appl. 306(2000), 155–163.

(29)

[8] P. Halmos: A Hilbert Space Problem Book.D. van Nostrand, Princeton 1967.

[9] N. J. Higham: Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, PA 1996.

[10] M. Kac, W. L. Murdock, and G. Szeg¨o: On the eigenvalues of certain Hermitian forms. J. Rat. Mech. Anal.2 (1953), 787–800.

[11] S. V. Parter: On the distribution of the singular values of Toeplitz matrices. Linear Algebra Appl.80(1986), 115–130.

[12] A. P. Prudnikov, Yu. A. Brychkov, and O. I. Marichev: Integrals and Series. Vol. 1. Elementary Functions.Gordon and Breach, New York 1986.

[13] S. Roch and B. Silbermann: Index calculus for approximation methods and singular value decomposition.J. Math. Analysis Appl. 225(1998), 401–426.

[14] S. M. Rump: Structured perturbations. Part I: Normwise distances. Preprint, April 2002.

[15] P. Tilli: A note on the spectral distribution of Toeplitz matrices. Linear and Multi-linear Algebra45 (1998), 147–159.

[16] O. Toeplitz: Zur Theorie der quadratischen und bilinearen Formen von unendlichvie-len Ver¨anderlichen.Math. Ann.70 (1911), 351–376.

[17] W. F. Trench: Asymptotic distribution of the spectra of a class of generalized Kac-Murdock-Szeg¨o matrices.Linear Algebra Appl. 294 (1999), 181–192.

[18] E. E. Tyrtyshnikov and N. L. Zamarashkin: Toeplitz eigenvalues for Radon measures.