getdocb530. 169KB Jun 04 2011 12:04:52 AM

(1)

in PROBABILITY

SUMS OF RANDOM HERMITIAN MATRICES AND AN

INEQUAL-ITY BY RUDELSON

ROBERTO IMBUZEIRO OLIVEIRA1

IMPA, Rio de Janeiro, RJ, Brazil, 22460-320.

email: [email protected]

SubmittedJanuary 28, 2010, accepted in final formMay 17, 2010 AMS 2000 Subject classification: 60B20

Keywords: Random Hermitian matrices, concentration inequalities, Khintchine Inequalities

Abstract

We give a new, elementary proof of a key inequality used by Rudelson in the derivation of his well-known bound for random sums of rank-one operators. Our approach is based on Ahlswede and Winter’s technique for proving operator Chernoff bounds. We also prove a concentration inequality for sums of random matrices of rank one with explicit constants.

1 Introduction

This note mainly deals with estimates for the operator norm_kZnkof random sums

Z_n_≡

n

X

i=1

ε_iA_i (1)

of deterministic Hermitian matrices A1, . . . ,An multiplied by random coefficients. Recall that

a Rademacher sequence is a sequence {ε_i_}n

i=1 of i.i.d. random variables with ε1 uniform over

{−1,+1}. Astandard Gaussian sequenceis a sequence i.i.d. standard Gaussian random variables. Our main goal is to prove the following result.

Theorem 1(proven in Section 3). Given positive integers d,n∈N_{, let A}₁_{, . . . ,}_A_n _{be deterministic}

d×d Hermitian matrices and{ε_i_}n

i=1 be either a Rademacher sequence or a standard Gaussian

sequence. Define Znas in (1). Then for all p∈[1,+∞),

(E_kZ_n_kp

)1/p_≤(p2 ln(2d) +C_p)

n

X

i=1

A2_i

1/2

where

Cp≡ p

Z+∞

0

tp−1e−t 2 2 d t

!1/p

(_≤cpp for some universal c>0).

1_{RESEARCH SUPPORTED BY A “BOLSA DE PRODUTIVIDADE EM PESQUISA" (CNPQ) AND RESEARCH GRANTS FROM}

CNPQ AND FAPERJ.

(2)

Ford=1, this result corresponds to the classical Khintchine inequalities, which give sub-Guassian bounds for the moments of Pn_i₌₁εiai (a1, . . . ,an ∈ R). Theorem 1 is implicit in Section 3 of

Rudelson’s paper [12], albeit with non-explicit constants. The main Theorem in that paper is the following inequality, which is a simple corollary of Theorem 1: ifY1, . . . ,Ynare i.i.d. random

(column) vectors inCd _{which are isotropic (i.e}E_Y₁_Y∗

for some universal C > 0, whenever the RHS of the above inequality is at most 1. This

im-portant result has been applied to several different problems, such as bringing a convex body to near-isotropic position[12]; the analysis of low-rank approximations of matrices[13, 7] and graph sparsification[14]; estimating of singular values of matrices with independent rows[11]; analysing compressive sensing[4]; and related problems in Harmonic Analysis[17, 16].

The key ingredient of the original proof of Theorem 1 is a non-commutative Khintchine inequality by Lust-Picard and Pisier[10]. This states that there exists a universalc>0 such that for allZ_nas in the Theorem, allp_≥1 and alld_×dmatrices_{B_i,D_i_}n

thus for the constant in Rudelson’s bound, can be obtained from the work of Buchholz[3]. Un-fortunately, the proofs of the Lust-Picard/Pisier inequality employs language and tools from non-commutative probability that are rather foreign to most potential users of (2), and Buchholz’s bound additionally relies on delicate combinatorics.

This note presents a more direct proof of Theorem 1. Our argument is based on an improvement of the methodology created by Ahlswede and Winter[2]in order to prove theiroperator Chernoff bound, which also has many applications e.g. [8](the improvement is discussed in Section 3.1). This approach only requires elementary facts from Linear Algebra and Matrix Analysis. The most complicated result that we use is the Golden-Thompson inequality[6, 15]:

∀d_∈N,_∀d_×dHermitian matricesA,B, Tr(eA+B)_≤Tr(eAeB). (3) The elementary proof of this classical inequality is sketched in Section 5 below.

We have already noted that Rudelson’s bound (2) follows simply from Theorem 1; see[12, Sec-tion 3]for detais. Here we prove a concentration lemma corresponding to that result under the stronger assumption that|Y1|is a.s. bounded. While similar results have appeared in other papers

[11, 13, 17], our proof is simpler and gives explicit constants.

(3)

In particular, a calculation shows that, for anyn,d_∈N,M>0 andδ_∈(0, 1)such that:

4M

r

2 ln(min{d,n}) +2 ln 2+ln(1/δ)

n ≤2,

we have:

P

1

n

X

i=1

YiYi∗−E

Y1Y1∗

<4M

r

2 ln(min{d,n}) +2 ln 2+ln(1/δ)

n

!

≥1₋δ.

A key feature of this Lemma is that it gives meaningful results even when the ambient dimension

dis arbitrarily large. In fact, the same result holds (withd=_∞) forYitaking values in a separable

Hilbert space, and this form of the result may be used to simplify the proofs in[11](especially in the last section of that paper).

To conclude the introduction, we present an open problem: is it possible to improve upon Rudel-son’s bound under further assumptions? There is some evidence that the dependence on ln(d) in the Theorem, while necessary in general[13, Remark 3.4], can sometimes be removed. For instance, Adamczak et al. [1]have improved upon Rudelson’s original application of Theorem 1 to convex bodies, obtaining exactly what one would expect in the absence of theplog(2d)term. Another setting where our bound is aΘplndfactor away from optimality is that of more clas-sical random matrices (cf. the end of Section 3.1 below). It would be interesting if one could sharpen the proof of Theorem 1 in order to reobtain these results. [Related issues are raised by Vershynin[18].]

Acknowledgement: we thank the anonymous referees for providing extra references, pointing out several typos and suggesting a number of small improvements to this note.

2 Preliminaries

We letCd×d

Herm denote the set ofd×d Hermitian matrices, which is a subset of the set C

d×d _of

all d_×d matrices with complex entries. The spectral theorem states that allA_∈Cd×d

Herm have d real eigenvalues (possibly with repetitions) that correspond to an orthonormal set of eigenvectors. λ_max(A)is the largest eigenvalue ofA. The spectrum ofA, denoted by spec(A), is the multiset of all eigenvalues, where each eigenvalue appears a number of times equal to its multiplicity. We let

kC_{k ≡} max

v∈Cd

|v|=1|

C v_|

denote the operator norm ofC∈Cd×d(| · |is the Euclidean norm). By the spectral theorem,

∀A_∈Cd×d

Herm,kAk=max{λmax(A),λmax(−A)}. Moreover, Tr(A)(the trace ofA) is the sum of the eigenvalues ofA.

2.1 Spectral mapping

Let f :C_→Cbe an entire analytic function with a power-series representation f(z)_≡P_n_≥₀cnzn

(z_∈C). If allcnare real, the expression:

f(A)_≡X

n≥0

(4)

corresponds to a map fromCd×d

Hermto itself. We will sometimes use the so-called spectral mapping property:

specf(A) =f(spec(A)). (4)

By this we mean that the eigenvalues of f(A)are the numbers f(λ)withλ_∈spec(A). Moreover, the multiplicity ofξ_∈specf(A)is the sum of the multiplicities of all preimages ofξunder f that lie in spec(A).

2.2 The positive-semidefinite order

We will use the notationA0 to say thatAispositive-semidefinite, i.e.A∈Cd×d

Hermand its eigenval-ues are non-negative. This is equivalent to saying that(v,Av)_≥0 for allv_∈Cd, where(_·,_··)is the standard Euclidean inner product.

IfA,B_∈Cd×d

Herm, we writeAB orBAto say thatA−B0. Notice that “" is a partial order and that:

∀A,B,A′,B′_∈Cd×d

Herm,(AA′)∧(BB′)⇒A+A′B+B′. (5) Moreover, spectral mapping (4) implies that:

∀A∈Cd×d

Herm,A

2_0. ₍₆₎

We will also need the following simple fact.

Proposition 1. For all A,B,C_∈Cd×d

Herm:

(C0)_∧(AB)_⇒Tr(CA)_≤Tr(C B). (7)

Proof: _{To prove this, assume the LHS and observe that the RHS is equivalent to Tr}₍_C_∆)_≥_{0 where}

∆_≡B₋A. By assumption,∆0, hence it has a Hermitian square root∆1/2_{. The cyclic property} of the trace implies:

Tr(C∆) =Tr(∆1/2C∆1/2).

Since the trace is the sum of the eigenvalues, we will be done once we show that∆1/2_C_∆1/2_0. But, since∆1/2_{is Hermitian and}_C_0,

∀v∈Cd,(v,∆1/2_C_∆1/2_v_{) = ((∆}1/2_v₎_,_C_(∆1/2_v_{)) = (}_w_,_{C w}₎_≥_{0 (with}_w_{= ∆}1/2_v_), which shows that∆1/2_C_∆1/2_{0, as desired.}

2.3 Probability with matrices

Assume(Ω,_F,P)is a probability space andZ :Ω_→Cd×d

Hermis measurable with respect toF and the Borelσ-field onCd×d

Herm(this is equivalent to requiring that all entries ofZbe complex-valued random variables). Cd×d

Hermis a metrically complete vector space and one can naturally define an expected valueE[Z]_∈Cd×d

Herm. This turns out to be the matrixE[Z]∈C

d×d

Hermwhose(i,j)-entry is the expected value of the(i,j)-th entry ofZ. [Of course,E[Z]is only defined if all entries ofZ

are integrable, but this will always be the case in this paper.]

The definition of expectations implies that traces and expectations commute:

(5)

Moreover, one can check that the usual product rule is satisfied:

IfZ,W:Ω_→Cd×d

Hermare measurable and independent,E[ZW] =E[Z]E[W]. (9) Finally, the inequality:

IfZ:Ω_→Cd×d

HermsatisfiesZ0 a.s.,E[Z]0 (10)

is an easy consequence of another easily checked fact:(v,E[Z]v) =E[(v,Z v)],v∈Cd_.

3 Proof of Theorem 1

Proof: _[_{of Theorem 1}_]_{The usual Bernstein trick implies that for all} _t_≥_0,

∀t_≥0,P _kZ_n_k)_≥t

≤inf

s>0e

−st_E

eskZnk_.

Notice that

EeskZnk_≤_E_esλmax(Zn)₊_E_esλmax(−Zn)₌₂_E_esλmax(Zn) ₍₁₁₎

since_kZ_n_k=max_{λ_max(Z_n),λ_max(₋Z_n)_}and₋Z_nhas the same law asZ_n.

The function “x_7→es x_{" is monotone non-decreasing and positive for all}_s_≥_{0. It follows from the}

spectral mapping property (4) that for alls_≥0, the largest eigenvalue ofesZn _is_esλmax(Zn)_{and all}

eigenvalues ofesZn_{are non-negative. Using the equality “trace}₌_{sum of eigenvalues" implies that}

for alls_≥0,

Eesλmax(Zn)=Eλmax

esZn_≤_E_Tr_esZn_.

As a result, we have the inequality:

∀t_≥0,P _kZ_n_{k ≥}t

≤2 inf

s≥0e

−st_E

TresZn_. ₍₁₂₎

Up to now, our proof has followed Ahlswede and Winter’s argument. The next lemma, however, will require new ideas.

Lemma 2. For all s∈R,

ETr(esZn₎_≤_Tr

e s2P_n

i=1A2i 2

.

This lemma is proven below. We will now show how it implies Rudelson’s bound. Let

σ2_≡

n

X

i=1

A2_i

=λmax

n

X

i=1

A2_i

!

.

[The second inequality follows fromPn_i₌₁A2

i 0, which holds because of (5) and (6).] We note

that:

Tr

e s2P_n

i=1A2i 2

≤dλ_max

e s2P_n

i=1A2i 2

=d es 2_σ2

2

where the equality is yet another application of spectral mapping (4) and the fact that “x_7→es2_x_/₂

" is monotone non-decreasing. We deduce from the Lemma and (12) that:

∀t≥0,P _k_Z_n_{k ≥}_t

≤2dinf

s≥0e

−st+s2σ2

2 =2d e− t2

(6)

This implies that for anyp_≥1,

To finish, we now prove Lemma 2.

Proof: _[_{of Lemma 2}_]_Define_D₀_≡Pn

Notice that this impliesE

Tr(eDn)≤E

Tr(eD0)_{, which is the precisely the Lemma. To prove}

(14), fix 1 ≤ j ≤ n. Notice that Dj−1is independent from sεjAj−s2A2j/2 since the{εi}ni=1are independent. This implies that:

ETrexpDj

(use Golden-Thompson (3)) _≤ E



j/2alwayscommute, hence the exponential of the sum is the

product of the exponentials. Applying (9) and noting thate−s2A2j/2is constant, we see that:

(7)

In the Gaussian case, an explicit calculation shows that E_exp_sε_jAj

= es2A2j/2_{, hence (15)}

holds. In the Rademacher case, we have:

Eexpsε_jA_je− s2A2_j

2 = f(A

j)

where f(z) =cosh(sz)e−s2z2/2. It is a classical fact that 0_≤cosh(x)_≤ ex2/2 for all x _∈R(just compare the Taylor expansions); this implies that 0_≤ f(λ)_≤1 for all eigenvalues ofAj. Using

spectral mapping (4), we see that:

specf(A_j) = f(spec(A_j))_⊂[0, 1],

which implies that f(Aj)I. This proves (15) in this case and finishes the proof of (14) and of

the Lemma.

3.1 Remarks on the original AW approach

A direct adaptation of the original argument of Ahlswede and Winter [2] would lead to an in-equality of the form:

E_Tr(esZn₎_≤_Tr_E_esεnAn_E_esZn−1_.

One sees that:

EesεnAn_e s2A2n

2 e s2kA2nk

2 I.

However, only the second equality seems to be useful, as there is no obvious relationship between

Tr

es 2_A2

n 2 E

esZn−1

and

Tr

E_esεn−1An−1_E

esZn−2+ s2A2_n

2

,

which is what we would need to proceed with induction.[Note that Golden-Thompson (3) cannot be undone and fails for three summands,[15].] The best one can do with the second inequality is:

ETr(esZn₎_≤_{d e} s2Pn

i=1kAik2 2 .

This would give a version of Theorem 1 withPn_i₌₁_kA_i_k2_replacing_kPn

i=1A2ik. This modified result

is always worse than the actual Theorem, and can be dramatically so. For instance, consider the case of aWigner matrixwhere:

Z_n_≡ X

1≤i≤j≤m

ε_{i j}A_{i j}

with theε_{i j} i.i.d. standard Gaussian and eachAi j has ones at positions(i,j)and(j,i)and zeros

elsewhere (we taked=mandn= m

2

in this case). Direct calculation reveals:

X

i j

A2_{i j}

=_k(m−1)Ik=m−1≪

m

2

=X

i j

kAi jk2.

(8)

4 Concentration for rank-one operators

In this section we prove Lemma 1.

Proof: _[_{of Lemma 1}_]_Let

We will show below that:

∀s≥0,φ(s)_≤2 min_{d,n}e2M2s2/nφ(2M2s2/n). (16) the Lemma then follows from the choice

s_≡ n

8M2min{2,t}

and a few simple calculations.[Notice that 2M2_s_≤_n_{/2 with this choice, hence 1/}₍₁₋₂_M2_s_/_n₎_≤ 2 and 2M2_s2_/₍_n₋₂_M2_s₎_≤₄_M2_s2_/_n_._]

To prove (16), we begin with symmetrization (see e.g. Lemma 6.3 in Chapter 6 of[9]):

φ(s)_≤E

the same argument as in (11), we notice that:

E

(9)

Therefore:

k ≤1 in the last inequality.] Plugging this into the conditional expectation above and integrating, we obtain (16):

φ(s) _≤ 2 min{d,n}E

5 Proof sketch for Golden-Thompson inequality

As promised in the Introduction, we sketch an elementary proof of inequality (3). We will need theTrotter-Lie formula, a simple consequence of the Taylor formula foreX:

∀A,B∈Cd×d

Herm, lim_n_→₊_∞(e

A/n_eB/n₎n₌_eA+B_. ₍₁₇₎

The second ingredient is the inequality:

∀k_∈N,_∀X,Y _∈Cd×d

Herm :X,Y0⇒Tr((X Y) 2k+1

)_≤Tr((X2Y2)2k). (18) This is proven in[6]via an argument using the existence of positive-semidefinite square-roots for positive-semidefinite matrices, and the Cauchy-Schwartz inequality for the standard inner product overCd×d. Iterating (18) implies:

Inequality (3) follows from lettingk_→+_∞, using (17) and noticing that Tr(_·)is continuous.

References

[1] Radoslaw Adamczak, Alexander E. Litvak, Alain Pajor and Nicole Tomczak-Jaegermann.

“Quantitative estimates of the convergence of the empirical covariance matrix in log-concave ensembles."Journal of the American Mathematical Society23: 535–561, 2010.

[2] Rudolf Ahlswede and Andreas “Winter. Strong converse for identification via quantum chan-nels."IEEE Transactions on Information Theory48(3): 569–579, 2002. MR1889969

(10)

[4] Emmanuel Candès and Justin Romberg. “Sparsity and incoherence in compressive sam-pling."Inverse Problems23:969 ˝U-985, 2007.

[5] Zoltan Füredi and János Komlós. “The eigenvalues of random symmetric matrices." Combi-natorica1(3): 233–241, 1981. MR0637828

[6] Sidney Golden. “Lower Bounds for the Helmholtz Function."Physical Review137: B1127 ˝ U-B1128, 1965.

[7] Nathan Halko, Per-Gunnar Martinsson and Joel A. Tropp. “Finding structure with random-ness: Stochastic algorithms for constructing approximate matrix decompositions." Preprint arXiv:0909.4061.

[8] Zeph Landau and Alexander Russell. “Random Cayley graphs are expanders: a simplified

proof of the Alon-Roichman theorem."Electronic Journal of Combinatorics Research Paper 62, 6 pp. (electronic), 2004. MR2097328

[9] Michel Ledoux and Michel Talagrand. Probability in Banach Spaces. Springer (1991).

MR1102015

[10] Françoise Lust-Piquard and Gilles Pisier. “Non commutative Khintchine and Paley inequali-ties."Arkiv för Matematik, 29(2): 241–260, 1991. MR1150376

[11] Shahar Mendelson and Alain Pajor. “On singular values of matrices with independent rows".

Bernoulli12(5): 761–773, 2006. MR2265341

[12] Mark Rudelson. “Random vectors in the isotropic position."Journal of Functional Analysis, 164 (1): 60–72, 1999. MR1694526

[13] Mark Rudelson and Roman Vershynin. “Sampling from large matrices: an approach through

geometric functional analysis."Journal of the ACM54(4): Article 21, 2007. MR2351844

[14] Daniel Spielman and Nikhil Srivastava. “Graph sparsification by effective resistances." In Proceedings of the 40th annual ACM symposium on Theory of Computing (STOC 2008).

[15] Colin J. Thompson. “Inequality with applications in statistical mechanics."Journal of Math-ematical Physics, 6: 1812–1813, 1965. MR0189688

[16] Joel A. Tropp. “On the conditioning of random subdictionaries."Applied and Computational Harmonic Analysis25(1): 1–24, 2008. MR2419702

[17] Roman Vershynin. “Frame expansions with erasures: an approach through the

non-commutative operator theory."Applied and Computational Harmonic Analysis18: 167–176, 2005. MR2121508