• Tidak ada hasil yang ditemukan

getdoce7e9. 246KB Jun 04 2011 12:05:07 AM

N/A
N/A
Protected

Academic year: 2017

Membagikan "getdoce7e9. 246KB Jun 04 2011 12:05:07 AM"

Copied!
8
0
0

Teks penuh

(1)

in PROBABILITY

TRANSPORTATION APPROACH TO SOME

CONCEN-TRATION INEQUALITIES IN PRODUCT SPACES

AMIR DEMBO1

Department of Mathematics and Department of Statistics, Stanford University, Stanford, CA 94305

e-mail: [email protected]

OFER ZEITOUNI2 Department of Electrical Engineering, Technion, Haifa 32000, ISRAEL

e-mail: [email protected]

Received: January 17, 1996. Revised: October 24, 1996

AMS 1991 Subject classifications: 60E15,28A35

Keywords and phrases: Concentration Inequalities, Product Spaces, Transportation.

Abstract

Using a transportation approach we prove that for every probability measuresP, Q1, Q2 onΩN

withP a product measure there exist r.c.p.d. νj such thatR

νj(·|x)dP(x) =Qj(·)and

Z

dP(x)

Z dP

dQ1

(y)β dP dQ2

(z)β(1 +β(12β))fN(x,y,z)

1(y|x)dν2(z|x)≤1,

for every β ∈ (0,1/2). Here fN counts the number of coordinates k for which xk 6=yk and xk 6=zk. In caseQ1 =Q2 one may take ν1=ν2. In the special case of Qj(·) =P(·|A) we recover some of Talagrand’s sharper concentration inequalities in product spaces.

In [Tal95, Tal96a], Talagrand provides a variety of concentration of measure inequalities which apply in every product space ΩN (with Ω Polish) equipped with a Borel product (probability) measureP. These inequalities are extremely useful in combinatorial applications such as the longest common/increasing subsequence, in statistical physics applications such as the study of spin glass models, and in areas touching upon functional analysis such as probability in Banach spaces (c.f. [Tal95, Tal96a] and the references therein). The proofs of these inequalities are all based on an induction onN, where in order to prove the concentration of measure result for a generic set A⊂ΩN+1 one applies the induction hypothesis for theN dimensional sets

A(ω) ={(y1, . . . , yN) : (y1, . . . , yN, ω)∈A},ω∈Ω fixed, andB=∪ωA(ω).

Marton, in [Mar96a, Mar96b], building upon [Mar86], extends some of Talagrand’s results to the context of contracting Markov chains. In these works concentration inequalities related to the “distance” between a set A and a pointx are viewed as consequences of inequalities

1Partially supported by an NSF DMS-9403553 grant and by a US-ISRAEL BSF grant

2Partially supported by a US-ISRAEL BSF grant and by the Technion E.& J.Bishop research fund

(2)

involving the appropriate “distance” between probability measures, in the special case that some of these measures are supported on A. Since the latter “distance” between measures involves an appropriate coupling, that is, finding the optimal way of “transporting” mass from one measure to another, it has been dubbed the transportation approach. This approach is applied in [De96, Tal96c], where other transportation problems are solved and some new information inequalities are derived in order to recover the inequalities of [Tal95] in their sharpest form and to obtain a few new variants.

A third, different approach to some of the inequalities of [Tal95] via Poincar´e and logarithmic Sobolev inequalities is provided in [Led96, BL96].

Common to all inequalities in [Tal95, Tal96a] is their additive nature, allowing for a relatively easy transfer from the transportation problem for N = 1 to the general case. In doing so, the relative entropy between probability measures plays a prominent role in [De96, Mar96a, Mar96b, Tal96c].

In contrast, Talagrand in [Tal96b] provides proofs by his induction method of a new family of inequalities, stronger than those of [Tal95, Tal96a], and in which the additive structure of the latter is replaced by a multiplicative one.

In this short note we adapt the transportation approach, avoiding the use of relative entropies, and recover as special cases some of the new inequalities of [Tal96b].

Specifically, forx= (x1, x2,· · ·, xN)∈ΩN, y= (y1,· · ·, yN)∈ΩN andz= (z1,· · ·, zN)∈ΩN

let

fN(x, y, z) = N

X

k=1

f1(xk, yk, zk) =

N

X

k=1

1xk6=yk,xk6=zk. (1)

Then, [Tal96b, Theorem 2.1] is the following concentration inequality:

Theorem 1 For every N ≥1,β ∈(0,∞), a product measure P on ΩN and AN there exist ν, such thatν(A|x) = 1for every x∈ΩN and for non-negative aβ/(2β+ 1),

Z

ΩN dP(x)

Z

A×A

(1 +a)fN(x,y,z)dν(y|x)dν(z|x) ≤P(A)−2β (2)

This inequality is applied in [Tal96b, Theorem 1.2] to control the tails of the norm of quadratic forms in Rademacher variables.

LetQ1, Q2 be Borel probability measures on ΩN, and letν1, ν2denote any r.c.p.d. such that

forj = 1,2

Z

ΩN

νj(·|x)dP(x) =Qj(·), (3)

and for j = 1,2, let ˆΩj ⊂ ΩN be such that Qj( ˆΩj) = 1 and dQjdP exists on ˆΩj, setting νj( ˆΩcj|x) = 0 for everyx∈ΩN.

The next theorem is the main result of this note. As pointed out in Remark 1 below, it implies Talagrand’s Theorem 1, at least for a certain range of parameters a, β.

Theorem 2 For every probability measuresP, Q1, Q2onΩN,N≥1such thatP is a product

measure, there exist νj satisfying (3) such that for every β ∈ (0,1/2) and non-negative a≤ β(1−2β),

Z

ΩN dP(x)

Z

ˆ Ω1×Ωˆ2

dP dQ1

(y)β dP dQ2

(z)β(1 +a)fN(x,y,z)dν1(y|x)dν2(z|x)≤1. (4)

(3)

Remark 1 Considering (4) withQj(·) =P(·|A) for which dQjdP (y) =P(A) for a.e. y∈Aand then takingν1=ν2 we recover (2) apart from the fact that now onlyβ ∈(0,1/2) is allowed

with a≤β(1−2β). In contrast to all previous examples, the constants in the concentration of measure inequality (2) differ from the best constants in the corresponding transportation inequality (4), where the choice a=β(1−2β) can not be improved upon, even for N = 1,

The same proof we provide allows for extending the inequality (4) toq >2 different measures Qj and the corresponding νj satisfying (3) with f1(xk, yk1,· · ·, y

q

k) = 1xk∈{/ yjk,j=1,...,q}. The extended inequality then holds for all choices ofβ, asuch that

Z 1

for every P and Q satisfying (9) and (10) below and J determined via (11) below. Setting Qj =P(·|A) one may hope to recover [Tal96b, Theorem 5.1], corresponding to (2) forq >2. However, a necessary condition for (5) to hold is a ≤βq ≤ 1, while the essence of [Tal96b, Theorem 5.1] is that forq >>2 one may takeβ= 1 anda/q bounded below away from zero. Key to the proof of Theorem 2 is the next proposition which is of some independent interest.

Proposition 1 For every probability measures P and Q on Ω, let Ωˆ be such that Q( ˆΩ) = 1 and dP/dQexists onΩ. Then, there exists a r.c.p.d.ˆ ν such that

Z

Remark 2 Proposition 1 is proved using the r.c.p.d. ν∗(·|x), independent ofβ, given in (8)

(4)

The proof of Proposition 1 relies upon the following technical lemma.

Lemma 1 Let hJ(u) = (u1−β+ (1−u)J)2+a(1−u)2J2. Suppose β < 1/2 and 0< a≤ β(1−2β). Then, hJ(·)is concave for every J ∈[1−β,1]and hJ(u)≤1for every u∈[0,1] and every J ∈[0,1−β].

Proof of Lemma 1: One checks that

h(3)J (x) = 2(1−β)βx−(β+2)[J((1 +β) + (2−β)x)−2(1−2β)x1−β].

Sincex1−β (1β)x+β for allx[0,1] it follows that

h(3)1β(x)≥2(1−β)βx−(β+2)[(1−β)3βx+ (1 + 3β2−2β)]≥0,

for every x∈[0,1] andβ ∈[0,0.5]. Sinceh(3)J (x) is monotone increasing inJ, it follows that h(3)J (x)≥0 for every J ∈[1−β,1],x∈[0,1]. Also,

h(2)J (1) = 2[(1−β)(1−2β)−2J(1−β) + (a+ 1)J2],

and for a≤β(1−2β)≤β/(1−β) both h(2)1 (1)≤0 andh(2)1β(1)≤0. Hence,h(2)J (1)≤0 for every J∈[1−β,1]. Withh(3)J (x)≥0 andh(2)J (1)≤0 we deduce thathJ(·) is concave in [0,1] for J ∈[1−β,1]. Finally,hJ(1) = 1 whileh′

J(1) = 2(1−β−J), implying that the concave functionh1−β(·) is bounded above byh1−β(1) = 1. SincehJ(·) is monotone increasing inJ,

the same applies to everyJ ∈[0,1−β].

Proof of Proposition 1: The case of P =Qis trivially settled by taking ν(·|x) = δx(·). Hence, assume hereafter that P 6=Qand without loss of generality assume also thata≥0. Let ˜Ω be such thatP( ˜Ω) = 1 and dQdP exists on ˜Ω. Set

ν∗(·|x) =

1∧dQ

dP(x)

δx(·) +

1−dQ

dP(x)

+

(Q−P)+(·)

(Q−P)+(Ω)

, x∈Ω˜, (8)

where (Q−P)+ denotes the positive part of the signed measureQ−P. Note thatν∗(·|x) for

x6∈Ω is irrelevant to the proof of Proposition 1. The r.c.p.d.˜ ν∗(·|x) satisfies (6) since for all

Γ⊂Ω

Z

ν∗(Γ|x)dP(x) =

Z

Γ∩Ω˜

1∧ dQ

dP(x)

dP(x) + (Q−P)+(Γ) (Q−P)+(Ω)

Z

˜ Ω

1−dQ

dP(x)

+

dP(x)

= P∧Q(Γ) +(P−Q)+(Ω) (Q−P)+(Ω)

(Q−P)+(Γ) =Q(Γ),

where P ∧Q =P−(P −Q)+ =Q−(Q−P)+. Letf = dQdP on ˆΩ andg = dQdP on ˜Ω, with

Q=Q◦f−1 and P =Pg−1 denoting the induced probability measures on [0,). Noting

that fg= 1 for P (and Q) a.e. x∈ Ωˆ∩Ω while˜ f = 0 for Qa.e.x∈Ωˆ∩Ω˜c andg = 0 for P a.e.x∈ΩˆcΩ, it follows that˜

Z 1

0

udP(u) +

Z 1−

0

(5)

and

Z 1

0

dP(u) +

Z 1−

0

udQ(u) =P( ˜Ω) = 1. (10)

Using the above definitions we see that

I(ν∗) =

Z 1

0

hJ(u)dP(u) +

Z 1−

0

v2β+1dQ(v)

withhJ(·) as is in Lemma 1 and

J =

R1−

0 vβ(1−v)dQ(v) R1−

0 (1−v)dQ(v)

. (11)

In view of Lemma 1,I(ν∗)1 for everyQfor whichJ 1β. FixingQsuch thatJ >(1β),

the concavity ofhJ(·) implies that

F(Q)=△ sup

{P:(9)and (10)hold}

I(ν∗) =phJ(α) +

Z 1−

0

v2β+1dQ(v)

where

Z 1−

0

dQ(u) = 1−αp,

Z 1−

0

udQ(u) = 1−p .

Since

J = 1 (1−α)p

Z 1−

0

(vβ−vβ+1)dQ(v),

it follows from Dubbins’ theorem [Du62] that suffices in evaluating supQF(Q) to consider atomicQwith at most three atoms. Fixingα, p∈[0,1) and the mass qi>0 of the atoms of Q(such thatP3

i=1qi= 1−αp), we arrive at

F(Q) =F(v) =phJ(v)(α) + 3 X

i=1

v2iβ+1qi

where

J(v) =

3 X

i=1

(vβi −vβi+1)qi/(p−αp)

andv= (v1, v2, v3)∈[0,1)3satisfies the linear constraintP3i=1viqi= 1−p.

Note that

1 qi

∂F

∂vi =CJ(v)(α)(βv β−1

i −(β+ 1)vβi) + (2β+ 1)v2iβ where

CJ(α) = 2(α1−β+ (1 +a)(1−α)J)≥2

p

hJ(α).

IfCJ(v)(α)≤2 then hJ(v)(α)≤1 implying thatF(v)≤p+P3i=1viqi= 1. Moreover, when

(6)

Thus, by Lagrange multipliers, to prove that sup we see that per givenv, q, p, the value ofF(v) increases inα, hence the maximum is obtained when q+αp= 1. Then, withp= (1−v)/(1−αv) andq= (1−α)/(1−αv) we getJ =vβ

Proof of Theorem 2: Forνj satisfying (3), let

(7)

Since Theorem 2 also holds forN= 1, there exist r.c.p.d. νj(n)which depend upon ˜yn,zn˜ and ˜

xn such that almost surelyµj(·|xn)Pn˜ −1(˜xn) both

Z

dP(n)(xn)

Z dP(n)

dQ(1n)(yn) βdP(n)

dQ(2n)(zn)

β(1 +a)f1(xn,yn,zn)

1(n)(yn|xn)dν2(n)(zn|xn)≤1, (15)

and Z

νj(n)(·|xn)dP(n)(xn) =Q(jn)(·). (16)

Note that (14) and (16) imply that the r.c.p.d. νj(y|x) =νj(n)(yn|xn)µj(˜yn|x˜n) satisfy (3) for j= 1,2. SincePn=P(n)×Pn−1, by (13) and (15) also

Z

Ωn dPn(x)

Z dP

n dQ1(y)

βdPn dQ2(z)

β(1 +a)fn(x,y,z)

1(y|x)dν2(z|x) =

Z

Ωn−1

dPn−1(˜xn)

Z dP

n−1

dQ1,n−1

(˜yn)β dPn−1

dQ2,n−1

(˜zn)β(1 +a)fn−1(˜xn,yn,˜ zn˜ )dµ

1(˜yn|x˜n)dµ2(˜zn|x˜n)·

Z

dP(n)(xn)

Z dP(n)

dQ(1n)(yn) βdP(n)

dQ(2n)(zn)

β(1 +a)f1(xn,yn,zn)

1(n)(yn|xn)dν2(n)(zn|xn)≤1.

Thus, by induction, (4) holds for allN ≥1.

References

[BHJ92] A.D. Barbour, L. Holst and S. Janson (1992): Poisson approximation. Oxford Uni-versity Press, Oxford.

[BL96] S. Bobkov and M. Ledoux (1996): Poincar´e’s inequalities and Talagrand’s concen-tration phenomenon for the exponential distribution. (preprint)

[De96] A. Dembo (1996): Information inequalities and concentration of measure. Ann. Probab.(to appear)

[Du62] L.E. Dubins (1962): On extreme points of convex sets. J. Math. Anal. Appl. 5, 237-244.

[Led96] M. Ledoux (1996): On Talagrand’s deviation inequalities for product measures. ESAIM: Probability and Statistics, 1, 63-87.

[Mar86] K. Marton (1986): A simple proof of the blowing-up Lemma. IEEE Trans. on Information Theory IT-32, 445-446.

[Mar96a] K. Marton (1996): Bounding ¯d-distance by information divergence: a method to prove measure concentration. Ann. Probab.,24, 857-866.

[Mar96b] K. Marton (1996): A measure concentration inequality for contracting Markov chains. GAFA,6, 556-571.

(8)

[Tal96a] M. Talagrand (1996): A new look at independence.Ann. Probab.24, 1-34.

[Tal96b] M. Talagrand (1996): New concentration inequalities in product spaces. Invent. Math.(to appear)

Referensi

Dokumen terkait

arbitrary pendent vertex of the star) is the unique tree that maximizes the distance spectral radius, and conjecture the structure of a tree which minimizes the distance

The situation is only slightly more complicated when one of the polynomials φj is a binomial, since the latter has exactly one non-zero root..

We then extend the well-known theorem of Schoenberg [7] for Euclidean distance matrices to block distance matrices and characterize a block distance matrix by the eigenvalues of

We will prove that if a limit measure is not absolutely continuous with respect to the Lebesgue measure then the corresponding random walk on the self similar graph does not have

To prove our functional central limit theorem for rescaled martingales, we do not need to work in the canonical setup of the latter result of Jacod and Shiryaev (1987) nor do we have

For those readers that are not familiar with the theory of multi armed bandits and multi parameter time changes, we provide another weak solution, which follows along the lines of

L., Central limit theorem for fluctuations in the high temperature region of the Sherrington-Kirkpatrick spin glass model. L., Quadratic replica coupling in the

But by the central limit theorem for m-dependent stationary sequence (see for example Brockwell and Davis, 1990, page 213), the latter is asymptotically normal with mean zero