• Tidak ada hasil yang ditemukan

getdoc0d89. 156KB Jun 04 2011 12:04:05 AM

N/A
N/A
Protected

Academic year: 2017

Membagikan "getdoc0d89. 156KB Jun 04 2011 12:04:05 AM"

Copied!
13
0
0

Teks penuh

(1)

in PROBABILITY

CONCENTRATION INEQUALITIES FOR THE SPECTRAL MEASURE

OF RANDOM MATRICES

BERNARD DELYON

IRMAR, Université Rennes 1, Campus de Beaulieu, F-35042 Rennes, France email: bernard.delyon@univ-rennes1.fr

SubmittedSeptember 3, 2010, accepted in final formOctober 24, 2010 AMS 2000 Subject classification: Primary 15A52; Secondary 60F10.

Keywords: Spectral measure, random matrices

Abstract

We give new exponential inequalities for the spectral measure of random Wishart matrices. These results give in particular useful bounds when these matrices have the formM=Y YT, in the case

whereY is ap×nrandom matrix with independent enties (weaker conditions are also proposed), andpandnare large.

1

Introduction

Recent literature has focused much interest on Wishart random matrices, that is matrices of the form:

M=1 nY Y

T

whereY∈Rp×nis a rectangularp×nmatrix with random centered entries, and bothnandpn

tend to infinity: typicallyp=p(n), andp(n)/ntends to some limit.

M can be seen as the empirical covariance matrix of a random vector of dimensionp sampled n times, each sample being a column of Y. It is common in applications to have a number of variables with a comparable order of magnitude with the number of observation samples; the analysis of the asymptotic spectral distribution for such matrices appears now to be an important issue, in particular for implementing significance test for eigenvalues (e.g. for principal component analysis withpvariables annobservations).

Our objective is to give new exponential inequalities concerning symmetric functions of the eigen-values, i.e. the variables of the form

Z=g(λ1, ...λp)

where(λi)1≤ip is the set of eigenvalues ofM. These inequalities will be upper bounds on E”eZE[Z

(2)

and lead naturally to concentration bounds, i.e. bounds on P(|ZE[Z]| ≥x)for large values of x.

Our contribution is to improve on the existent literature on the following points: (i) the function gis a once or twice differentiable general function (i.e. not necessarily of the form (2) below), (ii) the entries of Y are possibly dependent, (iii) they are not necessarily identically distributed and (iv) the bound is instrumental when bothpandnare large.

The columnise dependence is usually taken into account in essentially two different ways: either one assumes that the columns ofY are i.i.d. but each column is a dependent vector, or thatMcan be written, for some mixture matrixQ∈Rp×p

M= 1 nQX X

TQT (1)

where X Rp×n, p < nwith independent entries; this means that the j-th observation vector

Y.j has the specific formQX.j. This second way is thus more restrictive; but it is natural from a theoretical point of view since one obtains essentially the same results as in the caseQ=I d (cf.

[1]Chapter 8). We shall consider both cases (Theorem 3 and Theorem 2), the first one is much easier, but leads to much larger bounds whenpis large.

We shall give now a brief evaluation of results in this context. From now on in this section we assume for simplicity thatQ=I d. In order to be more specific, and to be able to compare with existing results, we shall stick in this introduction to the case where

g(λ1, ...λp) = p X

k=1

ϕ(λk) (2)

although any V-statistics of(λ1, ...λp)can be considered. According to distributional limit-theorems

(e.g. Theorem 9.10 p.247 of[1]), the asymptotic variance ofZ asntends to infinity andp/n con-verges to some limit y>0 is a rather complicated function of y; this suggests that the best one could hope in the independent case, and whenpandntend to infinity, would be something like

P(|ZE[Z]| ≥x)exp€f(p/n)x2Š (3) for some function f.

Guionnet and Zeitouni obtainned in[3]for the independent case (Corollary 1.81(a), withY =X, N=p,M=n)

P(|ZE[Z]| ≥x)4 exp€x2/g2Š (4) for some reasonable constantgwith order of magnitude gξ2kxϕ(x2)k

∞where

ξ=sup

i j k

Xi jk∞

and under the assumption of convexity of the function x 7→ϕ(x2). A similar result is also given without this convexity assumption (Corollary 1.8 (b)), but a logarithmic Sobolev inequality is required for the densities of theXi js.

(3)

We shall obtain here, in the independent case, the following particular consequence of Theorem 2 below (still withQ=I d):

E

exp(ZE[Z])

≤exp

64p

n ξkϕk

∞+ p

2kϕ′′k

‹2

. (5)

This leads to

P(|ZE[Z]| ≥x)2 exp

‚

n

p x2 s2

Œ

, s=16ξ2 

kϕk+p

2

kϕ′′k ‹

. (6)

In the case wherentends to infinity andp=cn, (4) gives, when it applies, a better bound than (6) because it does not involve second derivatives ofϕ; whenpnEquation (6) is always sharper. The amazing point is that this inequality is essentially a consequence of an extension by Boucheron, Lugosi and Massart of the McDiarmid inequality, Theorem 1 below.

Equation (5) will be applied in a forthcoming article with Jian-Feng Yao and Jia-Qi Chen whereϕ

is an estimate of the limit spectral density ofE[M]; it will be a key step to prove the validity of a cross-validation procedure.

In the case of independent columns only, recently Guntuboyina and Leeb obtainned the following concentration inequality (Theorem 1 (ii) of[4]):

P(|ZE[Z]| ≥x)2 exp

‚

−2x

2

nV2

Œ

(7)

whereV is the total variation of the functionϕonRand it is assumed thatkY

i jk∞≤1 for alli,j. This bound is never close to (3) but is essentially optimal in the context of independent columns ifp andnhave the same order of magnitude (Example 3 in[4]). We provide here an estimate which can be written as

E

exp(ZE[Z])

≤exp

‚

16p2ξ4kϕk2 ∞ n

Œ

. (8)

This is a simple consequence of Theorem 3 below witha2= pξ2(since the entries are indepen-dent). In the independent case, this improves on Corollary 1.8 of[3]only if p2is significantly smaller thann; it also improves on (7) ifpnsince in this casep2/nn. But it is hardly useful for applications wherepandnhave the same order of magnitude.

2

Results

The following Theorem is a bounded difference inequality due to Boucheron-Lugosi-Massart[2]

(Theorem 2 withλ=1,θ 0); it looks like the McDiarmid inequality but the important differ-ence here is that the infinity norm in (9) is outside the sum:

Theorem 1. Let Y= (Y1, ...Yn)be a zero-mean sequence of independent variables with values in some measured space E. Let f be a measurable function on Enwith real values. Let Ybe an independent copy of Y and set

(4)

Then, with the notation x+2=x21x>0,

E[exp(f(Y)E[f(Y)])]exp kE[X

k

(f(Y)f(Yk))2

+|Y]k∞

!

.

We shall use actually the weaker inequality:

E[exp(f(Y)E[f(Y)])]exp kX

k

(f(Y)f(Yk))2+k

!

. (9)

Notations. In what followskxkwill stand for the Euclidean norm of the vector xandkMkwill stand for the matrix norm of MRd×d:

kMk= sup kxk=1k

M xk.

kXkp is the usualLp-norm of the real random variableX. ei will stand for the i-th vector of the canonical basis, and we set

M.j=M ej= (Mi j)1≤id∈Rd

And similarly, Mi. is thei-th row of M, a row vector, a 1×d matrix. Ifu,v∈Rd, they will be understood also asd×1 matrix, in particularuTv=u,v, anduvTis ad×dmatrix.

Theorem 2. Let Q be a p×p deterministic matrix, X = (Xi j),1ip,1 jn be a matrix of random independent entries, and set

M= 1 nQX X

TQT.

Letλ7→g(λ)be a twice differentiable symmetric function onRpand define the random variable

Z=g(λ1, ...λp) where(λ1, ...λp)is the vector of the eigenvalues of M . Then

E”eZE[Zexp

64p

n ξγ

1+ p

2γ 2

‹2

. (10)

where

ξ=kQksup

i j k

Xi jk (11)

γ1=sup

k,λ| ∂g

λk(λ)| (12)

γ2=sup

λ k∇

2g(λ)k (matrix norm). (13)

In particular, for any x>0

P(|ZE[Z]|>x)2 exp

256pn x2ξ−4 

γ1+ p

2γ 2

‹−2

(5)

The next theorem deals with matricesX with independent columns; sinceQX has again indepen-dent columns, we assume here thatQ=I d.

Theorem 3. Let X = (Xi j),1≤ip,1≤ jn be a matrix of random independent columns, and

set

M=1 nX X

T

Letλ7→g(λ)be a differentiable symmetric function onRpand define the random variable

Z=g(λ1, ...λp) (15)

where(λ1, ...λp)is the vector of the eigenvalues of M . Then

E”eZE[Z]—≤exp

‚

16a4γ2 1 n

Œ

(16)

where

γ1=sup

λ | ∂g

∂ λ1(λ)|

a=sup

j k X

i

Xi j2k1/2.

In particular, for any x>0

P(|ZE[Z]|>x)2 exp

‚

nx

2

64a4γ2 1

Œ

. (17)

3

Proof of theorem 2

Define the function onRp×n

f(X) =g(λ1, ...λp).

We have to estimate

X

i j

(f(X)f(Xi j))2+

with the notations of Theorem 1. If we set

α=sup

i j k

Xi jk∞,

the Taylor formula implies

|f(X)f(Xi j)| ≤2α

∂f

∂Xi j(X)

+2α2sup

k,l,Y

2f

∂X2

kl

(Y)

(6)

We shall estimate these quantities with the help of Lemma 4 below, specifically Equations (24) and(25), which give bounds for the derivatives of a function of the eigenvalues; the variable tis hereXi j. We have for any matrixP

Equation (24) can be rewritten as

∂f(X)

wherePλkis the orthogonal projection on the eigenspace of Mcorresponding toλk, anddkis the

dimension of this space. It happens that we can get a good bound for a partial Euclidean norm of the gradient of f: set (jis fixed)

gk=∂g(λ)

∂ λk

ξk=PλkQX.j= (PλkQX).j

whereA.j stands for the jth column of the matrixA. Notice that in case of eigenvalues of higher

multiplicities, someξks are repeated, but ifξj=ξk,λj=λk, one has alsogj=gk (by symmetry of g); hence, denoting byKa set of indices such thatP

kKPλk=I d,

ξkk2 (orthogonality)

(7)

For the estimation of the second derivatives we shall use again Lemma 4. Equation (25) can be Hence, for anyj

X

and Equation (10) follows.

Equation (14) is easily deduced from (10): let us recall that a random variableX such that for anyt>0, and somea>0,E[et X]eat2

(8)

application of (10) with the function t g(x)instead of g(x)leads to the required bound on the

4

Proof of theorem 3

Define the function onRp×n

f(X) =g(λ) =g(λ1, ...λp). (20) We have to estimate

X

j

(f(X)f(Xj))2

+

with the notations of Theorem 1. Then the Taylor formula implies

|f(X)f(Xj)| ≤ sup

We shall estimate these quantities with the help of Lemma 4 of the appendix; withM(t) =1

n(X+

hence, for any j

|f(X)f(Xj)|216 n2γ

2 1a

4

and the result follows.

(9)

A

Smoothness of eigenvalues of a parameter dependent

sym-metric matrix

The following lemma has been a key instrument for proving Theorems 2 and 3:

Lemma 4. Let t7→M(t)be a function of class C3pfrom some open interval I with values in the set of real p×p symmetric matrices; there exists a twice differentiable parametrization of the eigenvalues t7→Λ(t) = (λ1(t), ...λp(t)), and the following equations are satisfied for1kp and tI :

X

j:λj=λk

˙

λj=Tr(M P˙ λ

k) (21)

X

j:λj=λk

¨

λj=Tr(M P¨ λ

k) +2

X

j:λj6=λk

dj1

λkλjTr(M P˙ λjM P˙ λk) (22)

X

j:λj=λk

˙

λ2

j=Tr(M P˙ λkM P˙ λk) (23)

where both sides are functions of the implicit variable t, Pλ is the orthogonal projection on the

eigenspace Eλof the eigenvalueλ, dkis the dimension of Eλk.

Let g be a twice differentiable symmetric function defined on some cube Q= (a,b)p containing the

eigenvalues of the matrices M(t), tI ; then the function

ϕ(t) =g(Λ(t)) is twice differentiable on I and for any tI

˙

ϕ=X

k

dk−1∂g

∂ λk

(Λ)Tr(M P˙ λ

k) (24)

|ϕ¨(t)| ≤γT r(M˙2(t)) +sup

k

∂g

∂ λk(Λ(t))

X

i

|νi(t)| (25)

whereγ=supΛ∈Qk∇2g(Λ)k(matrix norm), andνi(t)is the i-th eigenvalue ofM¨(t).

Proof. The smoothness comes from the possibility of finding a twice differentiable parametrization of the eigenvaluest7→(λ1(t), ...λp(t))which is simply a consequence of the theorem of[5]; let us recall the third statement of that theorem:

Consider a continuous curve of polynomials

P(t)(x) =xp+a1(t)xp−1+...+ap(t),t∈R.

If all ai are of class C3p then there is a twice differentiable parametrization x = (xl, . . .xp): RRof the roots of P.

It remains to prove Equations (21) to (25). We shall use below the notationP

λfor denoting a

sum over distinct eigenvalues, hence

X

λ

(10)

and for any function f:

Hence for any polynomial f, one has d

d tTr(f(M)) =Tr(M f˙

(M)) =X

λ

Tr(M P˙ λ)f′(λ) (27)

and on the other hand d

By choosing a polynomial f such that f′(µ)is zero for any eigenvalueµdifferent from a specific eigenvalueλk identification of (27) and (28) gives

X

j:λj=λk

˙

λj=Tr(M P˙ λ

k).

This is (21). For proving Equation (22), we shall first prove the following formula valid for any polynomial x7→ f(x):

(we recall that in these sums, eigenvalues of higher multiplicities of M =M(t)are counted only once). If indeed f(x) =xn, Equation (26) implies

Now (29) is just a consequence of the following formula forλ6=µ n

(11)

and on the other hand

The first term is

Tr(M f¨ ′(M)) =X

λ

Tr(M P¨ λ)f′(λ) (31)

and the second term may be computed via the Equation (29) Tr(M˙ d

identifying the r.h.s. of (30) as the sum of the r.h.s. of (31) and (32) with a polynomial f such that f′′(λ)is zero for all eigenvalueλ of M(t)and f′(λ)is zero for any eigenvalueλ with the

and (22) is proved. Equation (23) is proved similarly by considering a polynomial f such that f(λ)is zero for all eigenvalueλofM(t)andf′′(λ)is zero for any eigenvalueλwith the exception of a specific oneλkfor which f′′(λ

k) =1.

We now prove Equation (24). For any smooth symmetric function f(x,y)of two real variables, we have, if we denote by f1andf2the partial derivatives:

f1(x,y) =f2(y,x) (33)

f11(x,y) =f22(y,x) (34)

f12(x,y) =f12(y,x) (35)

which implies that on any point such thatx= y, one hasf1= f2, andf11= f22. Hence using (21), the symmetry ofg, and (33)

˙

(12)

For the estimation of the first term, we shall use the identity ¨M =P

iνiwiwiT where(w1, ...wp)is

an orthonormal basis of eigenvectors associated to(ν1, ...νp):

|T1| ≤sup The second term is by symmetry

T2=

X

j,k:λj6=λk

dk−1d−1j gk(Λ)−gj(Λ)

λkλj Tr(M P˙ λjM P˙ λk)

We need to notice the following: the relation (33) implies that the symmetric function f satisfies f1(x,y)f2(x,y) =1

2(f1(x,y)f2(x,y)f1(y,x) +f2(y,x)) (36) and considering the function

u(t) = f1(x+,y)−f2(x+,y), δ= yx ForT3notice that using (23)

(13)

References

[1] Zhidong Bai and Jack W. Silverstein. Spectral analysis of large dimensional random matrices. Springer Series in Statistics. Springer, New York, first edition, 2006. MR2567175

[2] Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration inequalities using the entropy method. Ann. Probab., 31(3):1583–1614, 2003. MR1989444

[3] A. Guionnet and O. Zeitouni. Concentration of the spectral measure for large matrices. Elec-tron. Comm. Probab., 5:119–136 (electronic), 2000. MR1781846

[4] Adityanand Guntuboyina and Hannes Leeb. Concentration of the spectral measure of large Wishart matrices with dependent entries. Electron. Commun. Probab., 14:334–342, 2009. MR2535081

[5] Andreas Kriegl, Mark Losik, and Peter W. Michor. Choosing roots of polynomials smoothly. II. Israel J. Math., 139:183–188, 2004. MR2041790

Referensi

Dokumen terkait

Sehubungan dengan berakhirnya waktu pendaftaran (23 Juni 2011) prakualifikasi Paket Pekerjaan Pengadaan Jasa Konsultansi Puskesmas ISO dan perusahaan yang mendaftar

Potongan A-A terletak pada kemiringan lereng yang relatif datar, sehingga bila hujan yang terjadi adalah hujan normal dengan kedalaman hujan 25 dan 40 mm,

segala rahmat dan kasih karunia-Nya yang tak terhingga, telah memberikan kesehatan, kemampuan serta kesempatan kepada Penulis sehingga dapat menyelesaikan Tugas

Perbedaan ini menyebabkan terjadinya perbedaan self efficacy antara kedua kelompok pada saat pretest karena semakin tinggi tingkat pendidikan dan pekerjaan akan

Setelah diumumkannya Pemenang pengadaan ini, maka kepada Peser ta/ Penyedia bar ang/ jasa ber keber atan at as pengumuman ini dapat menyampaikan sanggahan secar a

Kelompok Kerja Pengadaan Furniture dan Desain Interior/Eksterior Minsel Unit Layanan Pengadaan Badan Pengembangan SDM Perhubungan akan melaksanakan Pelelangan Sederhana

berfungsi sebagai penampil yang nantinya akan digunakan untuk menampilkan. status

Tetapi dalam konteks agama yang diakui masyarakat tertentu (masyarakat adat) ada yang belum secara utuh diakui, bahkan agama tersebut dikategorikan aliran kepercaya- an oleh