• Tidak ada hasil yang ditemukan

getdocb530. 169KB Jun 04 2011 12:04:52 AM

N/A
N/A
Protected

Academic year: 2017

Membagikan "getdocb530. 169KB Jun 04 2011 12:04:52 AM"

Copied!
10
0
0

Teks penuh

(1)

in PROBABILITY

SUMS OF RANDOM HERMITIAN MATRICES AND AN

INEQUAL-ITY BY RUDELSON

ROBERTO IMBUZEIRO OLIVEIRA1

IMPA, Rio de Janeiro, RJ, Brazil, 22460-320.

email: [email protected]

SubmittedJanuary 28, 2010, accepted in final formMay 17, 2010 AMS 2000 Subject classification: 60B20

Keywords: Random Hermitian matrices, concentration inequalities, Khintchine Inequalities

Abstract

We give a new, elementary proof of a key inequality used by Rudelson in the derivation of his well-known bound for random sums of rank-one operators. Our approach is based on Ahlswede and Winter’s technique for proving operator Chernoff bounds. We also prove a concentration inequality for sums of random matrices of rank one with explicit constants.

1

Introduction

This note mainly deals with estimates for the operator normkZnkof random sums

Zn

n

X

i=1

εiAi (1)

of deterministic Hermitian matrices A1, . . . ,An multiplied by random coefficients. Recall that

a Rademacher sequence is a sequence {εi}n

i=1 of i.i.d. random variables with ε1 uniform over

{−1,+1}. Astandard Gaussian sequenceis a sequence i.i.d. standard Gaussian random variables. Our main goal is to prove the following result.

Theorem 1(proven in Section 3). Given positive integers d,n∈N, let A1, . . . ,An be deterministic

d×d Hermitian matrices and{εi}n

i=1 be either a Rademacher sequence or a standard Gaussian

sequence. Define Znas in (1). Then for all p∈[1,+∞),

(EkZnkp

)1/p(p2 ln(2d) +Cp)

n

X

i=1

A2i

1/2

where

Cpp

Z+∞

0

tp−1et 2 2 d t

!1/p

(cpp for some universal c>0).

1RESEARCH SUPPORTED BY A “BOLSA DE PRODUTIVIDADE EM PESQUISA" (CNPQ) AND RESEARCH GRANTS FROM

CNPQ AND FAPERJ.

(2)

Ford=1, this result corresponds to the classical Khintchine inequalities, which give sub-Guassian bounds for the moments of Pni=1εiai (a1, . . . ,an ∈ R). Theorem 1 is implicit in Section 3 of

Rudelson’s paper [12], albeit with non-explicit constants. The main Theorem in that paper is the following inequality, which is a simple corollary of Theorem 1: ifY1, . . . ,Ynare i.i.d. random

(column) vectors inCd which are isotropic (i.eY1Y

for some universal C > 0, whenever the RHS of the above inequality is at most 1. This

im-portant result has been applied to several different problems, such as bringing a convex body to near-isotropic position[12]; the analysis of low-rank approximations of matrices[13, 7] and graph sparsification[14]; estimating of singular values of matrices with independent rows[11]; analysing compressive sensing[4]; and related problems in Harmonic Analysis[17, 16].

The key ingredient of the original proof of Theorem 1 is a non-commutative Khintchine inequality by Lust-Picard and Pisier[10]. This states that there exists a universalc>0 such that for allZnas in the Theorem, allp1 and alld×dmatrices{Bi,Di}n

thus for the constant in Rudelson’s bound, can be obtained from the work of Buchholz[3]. Un-fortunately, the proofs of the Lust-Picard/Pisier inequality employs language and tools from non-commutative probability that are rather foreign to most potential users of (2), and Buchholz’s bound additionally relies on delicate combinatorics.

This note presents a more direct proof of Theorem 1. Our argument is based on an improvement of the methodology created by Ahlswede and Winter[2]in order to prove theiroperator Chernoff bound, which also has many applications e.g. [8](the improvement is discussed in Section 3.1). This approach only requires elementary facts from Linear Algebra and Matrix Analysis. The most complicated result that we use is the Golden-Thompson inequality[6, 15]:

dN,d×dHermitian matricesA,B, Tr(eA+B)Tr(eAeB). (3) The elementary proof of this classical inequality is sketched in Section 5 below.

We have already noted that Rudelson’s bound (2) follows simply from Theorem 1; see[12, Sec-tion 3]for detais. Here we prove a concentration lemma corresponding to that result under the stronger assumption that|Y1|is a.s. bounded. While similar results have appeared in other papers

[11, 13, 17], our proof is simpler and gives explicit constants.

(3)

In particular, a calculation shows that, for anyn,dN,M>0 andδ(0, 1)such that:

4M

r

2 ln(min{d,n}) +2 ln 2+ln(1/δ)

n ≤2,

we have:

P

1

n

n

X

i=1

YiYi∗−E

”

Y1Y1∗

—

<4M

r

2 ln(min{d,n}) +2 ln 2+ln(1/δ)

n

!

≥1δ.

A key feature of this Lemma is that it gives meaningful results even when the ambient dimension

dis arbitrarily large. In fact, the same result holds (withd=) forYitaking values in a separable

Hilbert space, and this form of the result may be used to simplify the proofs in[11](especially in the last section of that paper).

To conclude the introduction, we present an open problem: is it possible to improve upon Rudel-son’s bound under further assumptions? There is some evidence that the dependence on ln(d) in the Theorem, while necessary in general[13, Remark 3.4], can sometimes be removed. For instance, Adamczak et al. [1]have improved upon Rudelson’s original application of Theorem 1 to convex bodies, obtaining exactly what one would expect in the absence of theplog(2d)term. Another setting where our bound is aΘ€plndŠfactor away from optimality is that of more clas-sical random matrices (cf. the end of Section 3.1 below). It would be interesting if one could sharpen the proof of Theorem 1 in order to reobtain these results. [Related issues are raised by Vershynin[18].]

Acknowledgement: we thank the anonymous referees for providing extra references, pointing out several typos and suggesting a number of small improvements to this note.

2

Preliminaries

We letCd×d

Herm denote the set ofd×d Hermitian matrices, which is a subset of the set C

d×d of

all d×d matrices with complex entries. The spectral theorem states that allACd×d

Herm have d real eigenvalues (possibly with repetitions) that correspond to an orthonormal set of eigenvectors. λmax(A)is the largest eigenvalue ofA. The spectrum ofA, denoted by spec(A), is the multiset of all eigenvalues, where each eigenvalue appears a number of times equal to its multiplicity. We let

kCk ≡ max

v∈Cd

|v|=1|

C v|

denote the operator norm ofC∈Cd×d(| · |is the Euclidean norm). By the spectral theorem,

ACd×d

Herm,kAk=max{λmax(A),λmax(−A)}. Moreover, Tr(A)(the trace ofA) is the sum of the eigenvalues ofA.

2.1

Spectral mapping

Let f :CCbe an entire analytic function with a power-series representation f(z)Pn0cnzn

(zC). If allcnare real, the expression:

f(A)X

n≥0

(4)

corresponds to a map fromCd×d

Hermto itself. We will sometimes use the so-called spectral mapping property:

specf(A) =f(spec(A)). (4)

By this we mean that the eigenvalues of f(A)are the numbers f(λ)withλspec(A). Moreover, the multiplicity ofξspecf(A)is the sum of the multiplicities of all preimages ofξunder f that lie in spec(A).

2.2

The positive-semidefinite order

We will use the notationA0 to say thatAispositive-semidefinite, i.e.A∈Cd×d

Hermand its eigenval-ues are non-negative. This is equivalent to saying that(v,Av)0 for allvCd, where(·,··)is the standard Euclidean inner product.

IfA,BCd×d

Herm, we writeAB orBAto say thatAB0. Notice that “" is a partial order and that:

A,B,A′,BCd×d

Herm,(AA′)∧(BB′)⇒A+AB+B′. (5) Moreover, spectral mapping (4) implies that:

A∈Cd×d

Herm,A

20. (6)

We will also need the following simple fact.

Proposition 1. For all A,B,CCd×d

Herm:

(C0)(AB)Tr(CA)Tr(C B). (7)

Proof: To prove this, assume the LHS and observe that the RHS is equivalent to Tr(C∆)0 where

BA. By assumption,∆0, hence it has a Hermitian square root∆1/2. The cyclic property of the trace implies:

Tr(C∆) =Tr(∆1/2C∆1/2).

Since the trace is the sum of the eigenvalues, we will be done once we show that∆1/2C1/20. But, since∆1/2is Hermitian andC0,

v∈Cd,(v,∆1/2C1/2v) = ((∆1/2v),C(∆1/2v)) = (w,C w)0 (withw= ∆1/2v), which shows that∆1/2C1/20, as desired. ƒ

2.3

Probability with matrices

Assume(Ω,F,P)is a probability space andZ :ΩCd×d

Hermis measurable with respect toF and the Borelσ-field onCd×d

Herm(this is equivalent to requiring that all entries ofZbe complex-valued random variables). Cd×d

Hermis a metrically complete vector space and one can naturally define an expected valueE[Z]Cd×d

Herm. This turns out to be the matrixE[Z]∈C

d×d

Hermwhose(i,j)-entry is the expected value of the(i,j)-th entry ofZ. [Of course,E[Z]is only defined if all entries ofZ

are integrable, but this will always be the case in this paper.]

The definition of expectations implies that traces and expectations commute:

(5)

Moreover, one can check that the usual product rule is satisfied:

IfZ,W:ΩCd×d

Hermare measurable and independent,E[ZW] =E[Z]E[W]. (9) Finally, the inequality:

IfZ:ΩCd×d

HermsatisfiesZ0 a.s.,E[Z]0 (10)

is an easy consequence of another easily checked fact:(v,E[Z]v) =E[(v,Z v)],v∈Cd.

3

Proof of Theorem 1

Proof: [of Theorem 1]The usual Bernstein trick implies that for all t0,

t0,P kZnk)t

≤inf

s>0e

stE”

eskZn.

Notice that

eskZnE”emax(Zn+E”emax(−Zn=2E”emax(Zn(11)

sincekZnk=max{λmax(Zn),λmax(Zn)}andZnhas the same law asZn.

The function “x7→es x" is monotone non-decreasing and positive for alls0. It follows from the

spectral mapping property (4) that for alls0, the largest eigenvalue ofesZn isemax(Zn)and all

eigenvalues ofesZnare non-negative. Using the equality “trace=sum of eigenvalues" implies that

for alls0,

esλmax(Zn)—=E”λmax

€

esZnŠ—E”Tr€esZnŠ—.

As a result, we have the inequality:

t0,P kZnk ≥t

≤2 inf

s≥0e

stE”

Tr€esZnŠ—. (12)

Up to now, our proof has followed Ahlswede and Winter’s argument. The next lemma, however, will require new ideas.

Lemma 2. For all s∈R,

E”Tr(esZn)—Tr

e s2Pn

i=1A2i 2

.

This lemma is proven below. We will now show how it implies Rudelson’s bound. Let

σ2

n

X

i=1

A2i

=λmax

n

X

i=1

A2i

!

.

[The second inequality follows fromPni=1A2

i 0, which holds because of (5) and (6).] We note

that:

Tr

e s2Pn

i=1A2i 2

max

e s2Pn

i=1A2i 2

=d es 2σ2

2

where the equality is yet another application of spectral mapping (4) and the fact that “x7→es2x/2

" is monotone non-decreasing. We deduce from the Lemma and (12) that:

t≥0,P kZnk ≥t

≤2dinf

s≥0e

st+s2σ2

2 =2d et2

(6)

This implies that for anyp1,

To finish, we now prove Lemma 2.

Proof: [of Lemma 2]DefineD0Pn

Notice that this impliesE

Tr(eDn)≤E

Tr(eD0), which is the precisely the Lemma. To prove

(14), fix 1 ≤ jn. Notice that Dj−1is independent from sεjAjs2A2j/2 since the{εi}ni=1are independent. This implies that:

E”Tr€exp€Dj

(use Golden-Thompson (3)) E

j/2alwayscommute, hence the exponential of the sum is the

product of the exponentials. Applying (9) and noting thates2A2j/2is constant, we see that:

(7)

In the Gaussian case, an explicit calculation shows that E”exp€sεjAj

Š—

= es2A2j/2, hence (15)

holds. In the Rademacher case, we have:

E”exp€jAjŠ—es2A2j

2 = f(A

j)

where f(z) =cosh(sz)es2z2/2. It is a classical fact that 0cosh(x) ex2/2 for all x R(just compare the Taylor expansions); this implies that 0 f(λ)1 for all eigenvalues ofAj. Using

spectral mapping (4), we see that:

specf(Aj) = f(spec(Aj))[0, 1],

which implies that f(Aj)I. This proves (15) in this case and finishes the proof of (14) and of

the Lemma. ƒ

3.1

Remarks on the original AW approach

A direct adaptation of the original argument of Ahlswede and Winter [2] would lead to an in-equality of the form:

Tr(esZn)—Tr€E”esεnAn—E”esZn−1—Š.

One sees that:

esεnAn—e s2A2n

2 e s2kA2nk

2 I.

However, only the second equality seems to be useful, as there is no obvious relationship between

Tr

es 2A2

n 2 E

”

esZn−1—

and

Tr

esεn−1An−1—E

esZn−2+ s2A2n

2

,

which is what we would need to proceed with induction.[Note that Golden-Thompson (3) cannot be undone and fails for three summands,[15].] The best one can do with the second inequality is:

E”Tr(esZn)—d e s2Pn

i=1kAik2 2 .

This would give a version of Theorem 1 withPni=1kAik2replacingkPn

i=1A2ik. This modified result

is always worse than the actual Theorem, and can be dramatically so. For instance, consider the case of aWigner matrixwhere:

Zn X

1≤ijm

εi jAi j

with theεi j i.i.d. standard Gaussian and eachAi j has ones at positions(i,j)and(j,i)and zeros

elsewhere (we taked=mandn= m

2

in this case). Direct calculation reveals:

X

i j

A2i j

=k(m−1)Ik=m−1≪

m

2

=X

i j

kAi jk2.

(8)

4

Concentration for rank-one operators

In this section we prove Lemma 1.

Proof: [of Lemma 1]Let

We will show below that:

s≥0,φ(s)2 min{d,n}e2M2s2/nφ(2M2s2/n). (16) the Lemma then follows from the choice

s n

8M2min{2,t}

and a few simple calculations.[Notice that 2M2sn/2 with this choice, hence 1/(12M2s/n) 2 and 2M2s2/(n2M2s)4M2s2/n.]

To prove (16), we begin with symmetrization (see e.g. Lemma 6.3 in Chapter 6 of[9]):

φ(s)E

the same argument as in (11), we notice that:

E

(9)

Therefore:

k ≤1 in the last inequality.] Plugging this into the conditional expectation above and integrating, we obtain (16):

φ(s) 2 min{d,n}E

5

Proof sketch for Golden-Thompson inequality

As promised in the Introduction, we sketch an elementary proof of inequality (3). We will need theTrotter-Lie formula, a simple consequence of the Taylor formula foreX:

A,B∈Cd×d

Herm, limn+(e

A/neB/n)n=eA+B. (17)

The second ingredient is the inequality:

kN,X,Y Cd×d

Herm :X,Y0⇒Tr((X Y) 2k+1

)Tr((X2Y2)2k). (18) This is proven in[6]via an argument using the existence of positive-semidefinite square-roots for positive-semidefinite matrices, and the Cauchy-Schwartz inequality for the standard inner product overCd×d. Iterating (18) implies:

Inequality (3) follows from lettingk+, using (17) and noticing that Tr(·)is continuous.

References

[1] Radoslaw Adamczak, Alexander E. Litvak, Alain Pajor and Nicole Tomczak-Jaegermann.

“Quantitative estimates of the convergence of the empirical covariance matrix in log-concave ensembles."Journal of the American Mathematical Society23: 535–561, 2010.

[2] Rudolf Ahlswede and Andreas “Winter. Strong converse for identification via quantum chan-nels."IEEE Transactions on Information Theory48(3): 569–579, 2002. MR1889969

(10)

[4] Emmanuel Candès and Justin Romberg. “Sparsity and incoherence in compressive sam-pling."Inverse Problems23:969 ˝U-985, 2007.

[5] Zoltan Füredi and János Komlós. “The eigenvalues of random symmetric matrices." Combi-natorica1(3): 233–241, 1981. MR0637828

[6] Sidney Golden. “Lower Bounds for the Helmholtz Function."Physical Review137: B1127 ˝ U-B1128, 1965.

[7] Nathan Halko, Per-Gunnar Martinsson and Joel A. Tropp. “Finding structure with random-ness: Stochastic algorithms for constructing approximate matrix decompositions." Preprint arXiv:0909.4061.

[8] Zeph Landau and Alexander Russell. “Random Cayley graphs are expanders: a simplified

proof of the Alon-Roichman theorem."Electronic Journal of Combinatorics Research Paper 62, 6 pp. (electronic), 2004. MR2097328

[9] Michel Ledoux and Michel Talagrand. Probability in Banach Spaces. Springer (1991).

MR1102015

[10] Françoise Lust-Piquard and Gilles Pisier. “Non commutative Khintchine and Paley inequali-ties."Arkiv för Matematik, 29(2): 241–260, 1991. MR1150376

[11] Shahar Mendelson and Alain Pajor. “On singular values of matrices with independent rows".

Bernoulli12(5): 761–773, 2006. MR2265341

[12] Mark Rudelson. “Random vectors in the isotropic position."Journal of Functional Analysis, 164 (1): 60–72, 1999. MR1694526

[13] Mark Rudelson and Roman Vershynin. “Sampling from large matrices: an approach through

geometric functional analysis."Journal of the ACM54(4): Article 21, 2007. MR2351844

[14] Daniel Spielman and Nikhil Srivastava. “Graph sparsification by effective resistances." In Proceedings of the 40th annual ACM symposium on Theory of Computing (STOC 2008).

[15] Colin J. Thompson. “Inequality with applications in statistical mechanics."Journal of Math-ematical Physics, 6: 1812–1813, 1965. MR0189688

[16] Joel A. Tropp. “On the conditioning of random subdictionaries."Applied and Computational Harmonic Analysis25(1): 1–24, 2008. MR2419702

[17] Roman Vershynin. “Frame expansions with erasures: an approach through the

non-commutative operator theory."Applied and Computational Harmonic Analysis18: 167–176, 2005. MR2121508

Referensi

Dokumen terkait

Abstract: We use Salem’s method [ 13 , 14 ] to prove an inequality of Kwapie´n and Pełczy´nski concerning a lower bound for partial sums of series of bi-orthogonal vectors in a

As already remarked we can prove similar results for arbitrary ground fields k when we assume corresponding results for the ℓ –rank of the class groups of quadratic extensions

We improve the known upper bound of the dimension n of an indecomposable unimodular lattice whose shadow has the third largest possible length, n − 16.. Occasionally, we will use χ (

In this section, we prove results concerning local (i.e., restricted to proper subspaces) vs global linear dependence of operators that will be used in the proof of Theorem 1.5, and

We prove a finite volume lower bound of the order √ log N on the delocalization of a disordered continuous spin model (resp.. The interaction is assumed to be massless,

Then, to prove IDLA inner bound, we need to show that for each vertex v ∈ V the Green function from 0 and expected exit time from a ball around 0 of a random walk starting at v ,

We prove the existence and uniqueness of a strong solution of a stochastic differential equation with normal reflection representing the random motion of finitely many globules..

In this paper, we will introduce the multilinear commutator of Marcinkiewicz operator and prove the boundedness properties of the operator for the extreme cases.. First let us