• Tidak ada hasil yang ditemukan

getdoc8cb2. 256KB Jun 04 2011 12:04:39 AM

N/A
N/A
Protected

Academic year: 2017

Membagikan "getdoc8cb2. 256KB Jun 04 2011 12:04:39 AM"

Copied!
28
0
0

Teks penuh

(1)

El e c t ro n ic

Jo ur

n a l o

f P

r o

b a b il i t y

Vol. 14 (2009), Paper no. 28, pages 752–779. Journal URL

http://www.math.washington.edu/~ejpecp/

Exponential inequalities for sums of weakly dependent

variables

Bernard Delyon

IRMAR, Université Rennes 1

Campus de Beaulieu, 35042 Rennes Cedex, France

bernard.delyon@univ-rennes1.fr

Abstract

We give new exponential inequalities and Gaussian approximation results for sums of weakly de-pendent variables. These results lead to generalizations of Bernstein and Hoeffding inequalities, where an extra control term is added; this term contains conditional moments of the variables.

(2)

1

Introduction

In the whole paper(Xi)1≤in is a sequence of centeredrandom variables. Our objective is to give

new exponential inequalities and Gaussian approximation results for the sum S = X1+· · ·+Xn

(and other functions of(X1, . . .Xn)) in the case where first and second order mixing conditions are assumed (first order mixing conditions involve conditionnal means and second order ones involve conditionnal covariances).

The essential application of exponential inequalities is to give small event probabilities; typically, we would like here to extend the Hoeffding and Bernstein inequalities to mixing processes, that is

P(SA)exp ‚

− 2A 2

P

ib2i +ρ

Œ

, aiXiai+bi (1)

P(SA)exp ‚

A

2

2E[S2] +2Am/3+ρ

Œ

, m=sup

k k

Xkk (2)

whereρ=0 if the variables are independent. We shall obtain here a value ofρwhich depends on conditional moments of the variables. We also want to provide inequalities which generalize what is already known for martingales. Actually Equation (2) is not satisfied for a martingale (E[S2]has to be replaced with a bound on the total variation); this will lead us to two different alternatives

P(SA)exp ‚

A

2

2v+2Am/3+ρ

Œ

, (3)

where v is a bound on some kind of quadratic variation andρ=0 in the case of a martingale (cf. Theorem 4 for a precise statement), or

P(SA)exp 

−

A2

2E[S2] +p2Aρ 1/3

, (4)

whereρ1 is a third order quantity (i.e. ifS is replaced with tS, ρ1 becomes t3ρ1) which involves

conditional moments and is in the independent caseρ1=

P

Var(Xk)kXkk∞+kXk3k∞/2 (Theorem 7

and Theorem 9). For instance, if we are close to the independent case andS is a normalized sum, that isXk=Uk/pkwhere(Uk)k is a sequence of weakly dependent bounded random variables,ρ1

has order 1/pnand the second term in the denominator is residual as long asA≪pn.

Bounds like Equation (3) will be obtained through what we call here the first order approach whereas (4) will require the second order approach. We present the main ideas below.

First order approach. Considering the sequenceX1, . . .Xnas a time series, for instance a

martin-gale, it is natural to introduce theσ-fields

Fk=σ(X1, . . .Xk). (5)

(3)

For any1kn is given a sequence Xkj, j=1, . . .k, which is a reordering of(Xj, j =1, . . .k)

with Xkk=Xk. We attach to each k a family ofσ-algebras(Fk

j)jksuch that

Fk jσ(X

k

i,ij), jk. (6)

In particular, we have

Fk k⊃Fk Fk

k−1⊃Fk−1.

If(Xi) is a time series, it is natural to setXik=Xi andFk

j =Fj =σ(Xi,ij): the superscripts can

be dropped. Later on, the term “times series” will refer to to this situation, whereas the general case will be rather refered to as “random fields”.

Xn

F

k Xk

Xk−1 Xk+1

X1

Xk

Xk−1 Xk+1

Xn

F

k j

When dealing with mixing random fields of Rd, each index k corresponds to some point Pk of

the space where Xk sits; for each k, the sequence (Xkj) will be typically obtained by sorting the original sequence(Xj)jkin decreasing order of the distanced(Pj,Pk). A simple example is the case of m-dependent fields indexed by Zd, that is, a process Xa,a Zd, such that the set of variables XA={Xa:aA}is independent ofXBif the setsAandBhave distance at leastm; they are typically fields of the form Xa =h(Ya+C)where Ya is an i.i.d. random field,C a finite neighborhood of 0 in

Zd, andha measurable function of|C|variables.

We would like to point out that this framework covers quite different situations. For instance, in the Erdös-Rényi model of an unoriented random graph with nvertices, edges are represented by 2n i.i.d. Bernoulli variablesYa b, 1 a< bn, with the conventionYa b= Yba andYaa =0 (see [2]

and references therein). The number of triangles (for instance) in such a model is

X

a,b,c

Ya bYbcYac=

X

a,b,c Xa bc.

The processX is here anm-dependent process on the set of three element subsets of{1, . . .n}. We shall treat this example in Section 3.5. As pointed out by Lumley[14], a similiar situation appears in the case of U-statistics; we shall however treat them with a martingale method.

(4)

and Theorem 3). The bound of Theorem 2 involvesv=PkkE[Xk2|Fk

k−1]k∞, and a remainder term involving conditional expectationskE[Xk|Fki]k∞. This is slightly unsatisfactory since it is known that the key quantity in the case of a martingale is the quadratic variationX=Pnk=1E[Xk2|Fk1], and in

most cases effective bounds will actually involvek〈X〉k, which is smaller thanv. This is corrected in Theorem 1, where we give a result which generalizes what is known from martingale theory and improves on classical papers concerned with mixing [6]. However, inspection of the bounds shows that this improvement is really effective only if the conditional expectations|E[Xk|Fk

i]| are

significantly smaller than|Xk|; if not, the only way to improve accuracy is to use the second order

approach of Section 3 briefly discussed below.

Second order results. By this terminology, we mean the following fact: the Hoeffding inequality

(Equation 1 withρ=0) for instance is obtained from the exponential inequality

E[etS]et2

P

ib2i/8. (7)

One obvious drawback of this upper bound is that whenttends to 0, it does not look like 1+t2E[S2]. One would rather expect something like

E[etS]et2E[S2]/2+C t3 (8) which has more interesting scaling properties; this approach would hopefully lead to significant improvements in a moderate deviation domain; this is what has been done in [5], but there S is an arbitrary function of independent variables, or of a Markov chain. In order to get closer, like in Equation (34) and (41) below, we have to pay with higher order extra terms: the remainder terms will not only contain conditional expectationsE[Xk|Fki]but also conditional covariances; this

will force us to consider for each pair of indices (i,j) another reordering of the sequence which corresponds to increasing dependence with the pair (Xi,Xj), and to introduce σ-fields Hi j

k; we

postpone details to Section 3.

In this context we shall give exponential inequalities and Gaussian approximation; we give in partic-ular a bound for|E[h(X)]E[h(N)]|wherehis any function ofnvariables with all third derivatives bounded andN is a Gaussian random variable with same covariance matrix asX.

The paper is organized as follows. The two forthcoming sections deal with first order and second or-der exponential inequalities. A classical use of the exponential inequalities leads to Theorem 4 which generalizes the Bernstein and Hoeffding inequalities. An application to concentration inequalities and triangle counts is given in Section 2.2.

Section 3 is concerned with the second order approach, with applications to bounded difference inequalities and triangle counts.

In Section 4 we give some estimates under mixing assumptions.

2

First order approach

2.1

General bounds

(5)

In Theorem 1 we present bounds which generalize known results concerning martingales. Since they only use the linear sequence ofσ-fieldsFk, they are essentially interesting for time series.

In Theorem 2 we give a Hoeffding bound which is valid in both cases (time series and random fields), and Theorem 3 gives a Bennett bound for random fields which does not exactly generalizes (9) because the quadratic variationXis changed into the more drastic upper boundv.

The applicability of the following theorem depends on the way one can bound the quadratic varia-tions involved. In the forthcoming examples, we shall consider only Equation (9) through a bound onk〈X〉k∞; however Equations (10) and (11) have the advantage of not involvingm.

Theorem 1. We are in the setting described in the introduction, with the filtration defined by (5). The

variables Xk are centered. We define

m= sup

REMARK. In the martingale caseq=0. We recommend[1]for an account on recent work concerning

exponential inequalities for martingales.

Proof. Consider a pair of functionsθ(x)andψ(x)such that

(6)

These functions are meant to beO(x2)in the neighborhood of 0. Three examples of such functions are

θ(x) =0, ψ(x) =exx−1

θ(x) =ζ(x+), ψ(x) =ζ(x−), ζ(x) =ex+x−1

θ(x) = x

2

6, ψ(x) =

x2

3.

Inequality (12) for these functions is proved in Proposition 12 of the Appendix. Set

Tk=

k

X

i=1

Xiθ(Xi)log(1+ξi), where ξi=E[ψ(Xi)|Fi−1].

Then

E[eTn] =E[eXnθ(Xn)(1+ξ

n)−1eTn−1]

E[(1+Xn+ψ(Xn))(1+ξn)−1eTn−1]

=E[Xn(1+ξn)−1eTn−1] +E[eTn−1]

=E[Xn((1+ξn)−1−1)eTn−1] +E n−1

X

i=1

Xn(eTi eTi−1) +E[eTn−1]

=r1+r2+E[eTn−1].

In the martingale case, the first two terms are zero; here we have

r1=E[E[Xn|Fn

−1]eTn−1ξn/(1+ξn)]≤E[|E[Xn|Fn−1]|eTn−1](kψ(Xn)k∞∧1).

The above defined functionψis convex withψ(0) =0,ψ(1)1 andψ(1)1. Hence|ψ(x)|∧1≤ |x|and therefore

r1≤ kXnkkE[Xn|Fn1]kE[eTn−1].

Let∆i=TiTi1; the second remainder is bounded as follows:

r2=E n−1

X

i=1

Xntanh(∆i/2)(eTi+eTi−1)

E n−1

X

i=1

|E[Xn|Fi]∆i|(eTi+eTi−1)/2

n−1

X

i=1

kE[Xn|Fi]∆ik sup jn−1

E[eTj].

Equation (53) of the Appendix implies that|tanh(∆i)| ≤3kXik∞, hence

r2≤3ρn sup in−1

E[eTi], ρ n=

n−1

X

i=1

(7)

Finally

E[eTn](1+4ρ n) sup

in−1

E[eTi]exp(4ρ n) sup

in−1 E[eTi]

and we get by induction that

sup

ik

E[eTi]exp(4 k

X

i=1

ρi).

In particular

E

exp ( n

X

i=1

Xiθ(Xi)−log(1+E[ψ(Xi)|Fi−1])

)

≤exp(4q)

hence

E[exp{

n

X

i=1

Xiθ(Xi)E[ψ(Xi)|Fi

−1]}]≤exp(4q).

This leads to the three bounds by using the three pairs of functions and by noticing that form0 andxm

ϕ(x)ϕ(m), ϕ(x) = e

x

x−1

x2 (13)

which is a consequence of L’Hospital’s rule for monotonicity[?], and that forx 0

ζ(x) x

2

2

since the functionx2/2−ζ(x)has a non-negative derivative.

Theorem 2. Assume that we are in the setting described in the introduction, with a family ofσ-fields

satisfying (6). The variables Xk are centered. We define now q as

q=

n

X

k=1 k−1

X

i=1

kXikkkE[Xk|Fk i]k∞

(this is consistent with the definition in Theorem 1). If the variables are lower and upper bounded with probability one:

aiXiai+bi (14)

the following inequality holds

E

exp S− 1 8

X

i b2i

!

≤e8q. (15)

In the martingale case (Fk

(8)

Proof. We assume first thatai and bi are deterministic. We start, as in the proof of the Hoeffding inequality, with the following inequality based on the majoration of the exponential function by the chord over the curve on[a,a+b]:

ex (a+b)e a

aea+b

b +x

ea+bea

b , axa+b.

ec ex

a c a + b x

It is well known that the first term of the right hand side,econ the figure, is smaller than exp(b2/8) independently ofa(this a key step for proving the Hoeffding inequality, see for instance Appendix B of[? ]). On the other hand, it is clear thatca+b(see the figure or boundea withea+b in the expression ofec). Hence, if we defineci anddi with the equations

ci=min(b

2 i

8 ,ai+bi), di=e

aie bi1

bi

we have

execi+d

ix, aixai+bi.

Now let the random variablesTj andTjkbe defined as

Tk=

k

X

i=1

(Xici), Tjk=

j

X

i=1

(Xikcki)

where(cik)ik is the corresponding reordering of the sequence(ci)ik. We obtain

E[eTn] =E[eXneTn−1−cn]

E[(ecn+d

nXn)eTn−1−cn]

In the martingale case, the term involving dn vanishes and this equation gives immediately the result. We assume now that we are not necessarily in this case butai andbi are deterministic. We

can assume in addition, without loss of generality, thatai and bi are chosen so that (14) is tight. Notice that in this case we also haveai0ai+bi sinceE[Xi] =0. The previous equation implies

E[eTn]d

necnE[XneTn−1] +E[eTn−1]

=dnecnE n−1

X

i=1 Xn(eT

n

ieTin−1) +E[eTn−1]

=dnecnr

(9)

Let∆i=TinTin1=Xincni; boundingr2as in the proof of Theorem 1 we get

(bni is the difference between the essential supremum and the essential infimum) we get that

r2≤2ρn sup

4 and (16) becomes finally

E[eTn](1+8ρ

The bound we obtained is still obviously valid forTn(α)since the replacement ofXi withαiXi does

not increasesρn, hence

sup

Theorem 3. Assume that we are in the setting described in the introduction, with a family ofσ-fields

satifying (6). The variables Xk are centered. We define

(10)

Proof. We set

Si=X1n+X2n+· · ·+Xin, in

andS0=0. Equation (13) implies that

eXn1+X

whereqnand vn are the terms corresponding tok=nin the definition ofqand v. This proves the result by induction.

2.2

Applications

2.2.1 Deviation bounds

In this section we give the deviation inequalities that can be deduced from the preceding exponential inequalities. We generalize the Bernstein inequality in Equations (18) and (22), and the Hoeffding inequality in Equation (21); one could get Bennett inequalities through a similar process, we refer to Appendix B of[? ]. In the martingale case, Equations (19) and (20) do not assume that the variables are bounded, but sums of squares are involved.

Theorem 4. With the notations of Theorem 1 we have for any A,y >0

(11)

With the notations of Theorem 2 and 3 we have for any A,y >0

P(SA,X

i

b2i 4y)exp ‚

A

2

2y+32q

Œ

(21)

P(SA)exp ‚

A

2

2(v+2q) +2Am/3 Œ

. (22)

In the martingale case, (21) remains true if we allow ai and bi to be an Fi−1-measurable random variable.

REMARK. Equation (21) is analogous to Corollary 3(a) of[6].

Proof. Applying the bound (9) to the variablest Xi for somet>0, we get

logP(SA,X〉 ≤ y)logE[exp{t(SA) t

2

X〉 −t2y t2m2 (e

t m

t m1)}]

≤4t2q+ y

m2(e t m

t m1)tA

y+8q m2 (e

t m

t m−1)tA.

The optimization of this expression w.r.t. t ≥ 0 is classical in the theory of Bennett and Bernstein inequalities and delivers (18); see for instance the Appendix B of[? ]. The second inequality is deduced from (10) with the same method: forV = [X+] +〈X−〉orV = ([X] +2〈X〉)/3 one has

logP(SA,Vy)logE[etStAt2(Vy)/2]4t2q+yt 2

2 −tA

and we take t=A/(y+8q).

Equations (21) and (22) are obtained similarly.

2.2.2 Bounded difference inequalities

The above results lead straightforwardly to bounded difference inequalities by using a classical martingale argument of Maurey[?]. Equation (26) is the McDiarmid inequality[?]. Equation (25) is a Bernstein inequality in the same context.

Theorem 5. Let Y = (Y1, . . .Yn)be a zero-mean sequence of independent variables with values in some

measured space E. Let f be a measurable function on Enwith real values. Set

S= f(Y)E[f(Y)]

Dk(y,z) = f(Y1, . . .Yk−1,y,Yk+1. . .Yn)− f(Y1, . . .Yk−1,z,Yk+1. . .Yn)

Φk=sup

y,z

E[Dk(y,z)|Y1, . . .Yk−1] (23)

k= f(Y)E[f(Y)|Y1, . . .Yk−1,Yk+1. . .Yn] =E[Dk(Yk,Yk′)|Y] m=sup

k

(12)

where Ykis an independent copy of Yk. We assume the measurability ofΦk. Then for any A,y >0

P(SA,X

k

Φ2k4y)exp ‚

A 2

2y

Œ

(24)

P(SA,X

k

E[∆2k |Y1, . . .Yk1] y)exp ‚

A

2

2y+2Am/3 Œ

. (25)

In particular

P(SA)exp ‚

− 2A 2

P

2k

Œ

(26)

δk=kDk(Yk,Yk′)k∞.

REMARK. Let us mention that if f has the form f(Y) =supgΓ g(Y)for some finite class of functions

Γthen, with obvious notations,

Dk(y,z) =sup g∈Γ

g(Y1, . . .Yk−1,y,Yk+1. . .Yn)−sup g∈Γ

g(Y1, . . .Yk−1,z,Yk+1. . .Yn)

≤sup

g∈Γ{

g(Y1, . . .Yk−1,y,Yk+1. . .Yn)−g(Y1, . . .Yk−1,z,Yk+1. . .Yn)}

=sup

g∈Γ

Dkg(y,z)

in particularδk≤supg∈Γδ

g

k. This is a classical argument in the theory of concentration inequalities.

Proof. We shall utilize (21) and (18) with

Xk=E[f(Y)|Fk]E[f(Y)|Fk

−1] Fk=σ(Y1, . . .Yk).

We have already pointed out thatq=0 sinceXkis a martingale difference. Let us define the random variables

Lk=infy E[Fk(y)|Y1, . . .Yk−1]

Uk=sup

y

E[Fk(y)|Y1, . . .Yk1].

The equation

LkE[f(Y)|Fk]Uk

implies

LkE[f(Y)|Fk

−1]≤XkUkE[f(Y)|Fk−1]

and sinceUkLk= Φkwe can apply (21) withbk= Φkand get (24).

ClearlyXk rewrites

Xk=E[∆k|Fk]

henceE[X2k |Fk1]E[∆2

(13)

2.2.3 Inequalities for suprema of U-statistics

For some problems of adaptive estimation and testing, it is very important to be able to control the supremum of U-statistics[9]. We give here a bound in this direction.

Consider a sequence of i.i.d. random variablesY1, . . .Yn with values on some measurable space E

and a finite famillyH of measurable symmetric functions onEd and set forhH

YI={Yi,iI}, I⊂ {1, . . .n}

Zh(Y) = 1n

d

X

I⊂{1,...n}

h(YI)

S=sup

hH

Zh(Y)−E[sup hH

Zh(Y)]. (27)

where the sum is restricted to the subsets with cardinalityd; since the kernelhis symmetric there is no ambiguity regarding the notationh(YA). We assume thathis centered:

E[h(Y1, . . .Yd)] =0.

It is well known that ifZhis non degenerate, that is

E[h(Y1, . . .Yd)|Y1]6≡0,

then the variance ofZh has ordern−1, cf[13]p.12. We give in the following corollary a deviation

bound for S which corresponds to a Gaussian approximation with variance of the same order of magnitude. In the case of degenerate U-statistics we do not get good bounds; this is apparent in the caseHhas only one element since an abundant litterature exists concerning deviation of degenerate U-statistics[11?; 4].

The function Lbelow may be bounded by 2khk∞, what should be generally good enough unlessY1

takes a specific value with high probability in which case (29) may become significantly better:

Corollary 6. The symmetric function h satisfies for some function L(x,y)

|h(y1, . . .yd)−h(y1′,y2, . . .yd)| ≤L(y1,y1′).

Then, for any A>0

P(SA)exp ‚

−2nA 2

µ2

Œ

, µ=dess supL(Y1,Y2) (28)

P(SA)exp ‚

nA

2

d2E[L(Y1,Y1′)2] +2Aµ/3

Œ

. (29)

Proof. With the notation of Theorem 5, we have withYk= (Y1, . . .Yk−1,Yk′,Yk+1, . . .Yn)

ksup

h

Zh(Y)sup

h

Zh(Yk)sup

h

(14)

and

|Zh(Y)−Zh(Yk)| ≤

1

n d

| X

Ik

h(YI)−h(YIk)|

≤ 1n d

n

−1

d1

L(Yk,Yk′)

= d

nL(Yk,Y

k).

Hence∆k d

nL(Yk,Y

k)and the result is now just a consequence of Theorem 5, noticing that

v d 2

2nE[L(Y1,Y

1) 2]

k∆kk µ

n.

3

Second order approach

As mentionned in the introduction, in order to get better results, we shall have to control conditional expectations of productsXkXj; actually, our procedure will rather lead to products likeXkXkj; hence

we are led to introduce for any pair(k,j) a sequenceXik,j corresponding to increasing dependence with(Xk,Xkj). More precisely:

INHOMOGENEOUS SETTING

(i) For any1kn is given a sequence Xkj, j=1 . . .k, which is a reordering of(Xj, j=1, . . .k)

with Xkk=Xk. We attach to each k theσ-algebras

Fk j =σ(X

k

i,ij), jk. (30)

(ii) For 1 j k n, sequences(Xik,j)i<j and (Hk,j

i )i<j are defined as above from the sequence

(Xik)i<j.

In other words theσ-field Fk

j is made by taking offF k

j+1 the “closest” variable to Xk. Theσ-field Hk,j

i is made by continuing this process afterF k

j (hencei< j) in a way which may depend onkand j.

This setup is essentially, in a somewhat more general context, what is considered in[6]. For time series, we haveXik=Xi andFk

i =H k,j

i =Fi=σ(Xl,li).

It happens commonly in the theory of random fields that there is no natural order on the variables (this is however typically not the case if there is a time variable). In those cases, instead of building theFk

i by reordering the variables which are “before”Xk, it is not more restrictive to assume that for

anyk there exist a reordering ofallvariables for which the conditional expectations will decrease rapidly enough. In this case theFk

(15)

HOMOGENEOUS SETTING

This setting is adequate for dealing with mixing random fields in which case each index k corre-sponds to some point Pk of the space; for each k, the sequence (Xkj)j will be obtained by sorting

i , is typically good enough for random fields over the Euclidean space (see

Section 4).

3.1

The homogeneous setting

The following theorem states a Gaussian approximation result (33) and an exponential bound (34). Equation (33) may seem rather overtechnical; the reader can think first of the case p=1,q=, which is natural if hhas bounded third derivatives; however the proof of (34) utilizes the case

(16)

Let h be a three time differentiable function of n variables, and N be a centered Gaussian vector

inde-mixing assumptions, rp/σ2 is expected to tend to 0 as the number of variables ntends to infinity;

in this case, (33) leads in particular to the central limit theorem forS/σ.

For the sake of clarity, we shall start the proof with a preparatory lemma: We know that if(Y,Z)is a Gaussian vector andY is scalar centered, for any differentiable function gone has

E[Y g(Z)] =X

i

E[Y Zi]E[gi′(Z)], where gi′(x) = d d xig(x)

under appropriate integrability conditions. We shall see that, when the variables are not Gaussian, bounds of the difference between both sides whenY varies among the coordinates ofZ provide an efficient measure, on some sense, of how far Z is from a Gaussian vector of the same covariance (this will appear in Equation (37) below).

Next lemma provides a way to estimate such a bound:

Lemma 8. Let Y be a centered random variable and Z be a random vector onRn. For any1 jn

Then for any function g two times differentiable on Rn, with first (resp. second) order derivatives denoted by gj (resp. g′′i j), and1p≤ ∞:

(17)

Proof. We set

¯

Zj= (Z1, . . .Zj, 0, . . . 0)

and ¯Zij is defined by setting to zero all entries in ¯Zj which are not a Z j

l for some li; hence

¯

Zjj1=Z¯j1and ¯Zj

0=0.

The key observation is that if all terms of the sums below are integrable we have

E[Y g(Z)]

n

X

j=1

E[Y Zj]E[gj(Z)] =E n

X

j=1

Y(g(Z¯j)g(Z¯j1)Zjgj(Z¯j−1))

+E n

X

j=1 j−1

X

i=1

(Y ZjE[Y Zj])(gj(Z¯ij)gj(Z¯ij1))

E n

X

j=1

X

ij

E[Y Zj](gj(Z¯i)gj(Z¯i1)). (36)

Clearly indeed, for any j, the factors of E[Y Zj] are the same on both sides, except a residual

E[Y Zj]gj(0) on the right hand side which compensates for a remaining term from the otherwise

vanishing sum of the E[Y Zjgj(Z¯ij)](namely the term i=0); finally the termsY g(Z¯j)of the right hand side sum up toE[Y g(Z)]since E[Y] =0.

The general identities

f(x+h)f(x)h f′(x) =h2

Z 1

0

(1t)f′′(x+th)d t

f′(x+h)f′(x) =h

Z 1

0

f′′(x+th)d t

imply that (ei is theith vector of the canonical basis ofRn)

g(Z¯j)g(Z¯j1)Zjgj(Z¯j−1) =Z2j

Z 1

0

(1t)g′′j j(Z¯j1+t Zjej)d t

gj(Z¯ij)gj(Z¯ij1) =Zij

Z 1

0

gi j′′(Z¯ij1+t Zijei)d t

hence by the Minkowski inequality

kZj 2(g(Z¯j)

g(Z¯j−1)

Zjgj(Z¯ j−1))

kq≤sup α k

g′′i j(α1Z1, . . . ,αnZn)kq

Z 1

0

(1t)d t

k(Zij)−1(gj(Z¯ij)gj(Z¯ij1))kqsup

α k

g′′i j(α1Z1, . . . ,αnZn)kq

(18)

Proof of theorem 7. Set for 0t1

ϕ(t) =E[h(t X+ p

1−t2N)].

Using that for any differentiable function f with bounded derivatives E[Nkf(N)] = P

Concerning (34), notice first that an elementary scaling argument reduces to the case t = 1. We consider now the functionh(x) =exp(Pxi):

(the reason for the choice ofσrather thanσwill appear soon) Equation (38) implies

ϕ(t)ϕ(0) +r

attains necessarily its maximum on extremal points of[0, 1]n. Hence (39) is again valid if instead

ofX we put anyYXin the definition ofϕsince randσwould be smaller, thus

ψ(t)ψ(0) +r

Z t

0

s2ψ(s)ds

which implies via the Gronwall lemma that

ψ(t)ψ(0)exp(rt3/3) =exp(σ2/2+rt3/3).

(19)

3.2

The inhomogeneous setting

In this section we give a companion to Theorem 7 in the inhomogeneous case.

Theorem 9. Set

in the expression ofwq. Integrating (42) we get

(20)

Ifλ=iaiR, takingp=in (43), (44) implies

|ϕ(t)ea2σ(t)2/2ϕ(0)ea2σ(0)2/2| ≤ |a|3w1nexp(a2 sup

0≤t≤1

σ(t)2/2).

Now since the functionσ(t)is convex, its supremum over[0, 1]is eitherσ(0)orσ(1)hence

|ϕ(1)ea2σ(1)2/2ϕ(0)ea2σ(0)2/2| ≤ |a|3w1nea2σ2m/2

which implies (40) by induction onn. Now for a fixed realλ∈R, let

ϕ(t) =sup

Y∈X

E[(Y1+···+Yn−1+t Yn)].

Equations (43) and (44) withp=1 imply

ϕ(t)ϕ(0)2(σ(t)2−σ(0)2)/2+|λ|3wn

Z t

0

2(σ(t)2−σ(s)2)/2ϕ(s)ds.

We have forts

σ(t)2σ(s)2=2(ts)E[XnSn1] + (t2s2)E[X2n]2(ts)E[XnSn1]++ (t2s2)E[Xn2].

Hence, if we setu(t) =t E[XnSn1]++t2E[Xn2]/2

ϕ(t)ϕ(0)2(u(t)−u(0))+|λ|3wn

Z t

0

2(u(t)−u(s))ϕ(s)ds.

For anyY ∈X, the same bound holds if X is replaced byY in the definition ofϕ, since the

corre-sponding values ofwn andu(t)u(s)will be smaller (eitherαn =0 and the corresponding value ofu(t)u(s)is zero, orYn=αnXnandYi=Xi,i<n). Hence

ϕ(t)ϕ(0)2(u(t)−u(0))+|λ|3wn

Z t

0

2(u(t)−u(s))ϕ(s)ds

or

ϕ(t)eλ2u(t)≤ϕ(0)eλ2u(0)+|λ|3wn

Z t

0

eλ2u(s)ϕ(s)ds,

and by Gronwall’s Lemma:

ϕ(t)eλ2u(t)ϕ(0)eλ2u(0)2et|λ|3wn

which gives fort=1

ϕ(1)ϕ(0)2(E[XnSn−1]++E[X2n]/2)+|λ|3wn.

(21)

3.3

Applications to deviation bounds

In this section we give the deviation inequalities that can be deduced from the preceding exponential inequalities. The bound (46) is analogous to the bound or Corollary 3 (b) in[6] p.85, but in this paper the variancev is amplified with an extra factor 2e2 andwtakes a slightly different value.

Theorem 10. With the notations of Theorem 7 and Theorem 9, we have for any A>0

P(SA)exp 

−

A2

2σ2

∗+2 p

2Ar/3 

, (45)

P(SA)exp 

−

A2

2v+2p2Aw/3 

. (46)

Proof. Using (34) we get

logP(SA)logE[et(SA)]σ2t

2

2 +r

t3

3 −tA.

We chooset=A/(σ2+p2Ar/3); in particulart ≤p3A/2r(value of t ifσ=0) hence

logP(SA)σ2t

2

2 + p

2Ar/3 t

2

2 −tA=−

tA

2 =−

A2

2σ2

∗+2 p

2Ar/3 .

Equation (46) is obtained similarly on the basis of (41).

3.4

More on bounded difference inequalities

We give a second order variant of Equation (26).

Theorem 11. With the notation of Theorem 5, the following inequality holds true

P(SA)exp 

−

A2

2Var(S) +2pAPkδ3k

.

Proof. As in the proof of Theorem 5, we notice thatSis the sum of the following martingale incre-ments

Xk=E[f(Y)|Fk]E[f(Y)|Fk

−1] =E[∆k|Fk] Fk=σ(Y1, . . .Yk)

but now we use Equation (46). As pointed out right after Theorem 9, one has in the martingale case:

w=XVar(Xk)kXkk+kXk3k/2 3 2

X

kXk3k

(22)

3.5

Triangle counts

We shall show that the standard Gaussian approximation is asymptotically valid for triangle counts in the moderate deviation domain.

In the Erdös-Rényi model of an unoriented random graph withnvertices, edges are represented by

n 2

i.i.d. Bernoulli variables Ya b, 1a < bn, with the conventionYa b =Yba andYaa =0. The number of triangles in such a model is

Z= X

. Recall that rrewrites, with these notations

term vanishes, independently of the construction ofHτ,j i .

We consider now the case j>|T| −n1. In this case, and if j6=|T|,j =Xτj for someτj which has

two points in common withτ, for instance {a,b,d}. We define theσ-fieldsHτ,j

i by excluding first

(23)

2(n−3) =1+5(n−3)points to exclude. If j=|T|, we takeHτ,|T| i =G

τ

i , and this contributes to at

mostn1non-zero terms in the middle sum; finally:

X

In this case it is easily verified thatσ2 = Var(S) since covariances are non-negative. Let us recall that (see[2])

This has ordern4 due to the covariance terms. Let us briefly compare with the bound of[2]; this paper delivers a bound forP(SA)which is sligthly larger than

exp square root term in (48) is residual ifAn3; this is the moderate deviation case since the centering term in (47) has order n3 (notice that Sn3 w.p.1), and we get the right variance. In (49), a change occurs whenn5/2A, and if we setA=Bn5/2withBlarge, (49) leads to exp(cnB)while (47) behaves like exp(cnB2).

4

Evaluation of constants under mixing assumptions

We give here informally some arguments to convince the reader that under standard ϕ-mixing assumptions the constantqhas the same order as the variance of the sum, and thatwandrwill be small; underβ-mixing assumptions one can control onlyrs fors<∞andw1.

(24)

4.1

ϕ

-mixing

Theϕ-mixing constant between twoσ-fieldsAandBis defined as

ϕ(A,B) = sup

A∈A,B∈B|

P(B|A)P(B)|.

It is well known, see reference[? ] p.27 or [12] p.278, that this implies that if Z is a zero-mean

B-measurable random variable

kE[Z|A]k

∞≤2ϕ(A,B)kZk∞.

Assume thatX is a field over a part ofZd: each variableXk of the field sits on somePkZd. Gk j is

theσ-field generated by the jmore distant points from Pk, and we take for simplicityHk,j i =G

k i .

Notice that the distance ofPktoPjk, the jth closest point, is at leastc j1/d, for some constantc. This implies that standardϕ-mixing assumptions betweenσ(Xk)andGjk rewrite

ϕ(Gjk,σ(Xk))ϕ,1((nj)1/d) (50)

for some decreasing function ϕ,1 (for example exponential decay holds for finite range1

shift-invariant Gibbs random fields[10]pp. 158-159; this contains a lot of examples). The subscripts∞

and 1 onϕ mean that there is no restriction on the number of random variables containned in the firstσ-field,Gjk, and there is only 1 variable in the second,σ(Xk); we use this traditionnal notation,

in particular for compatibility with[6].

On the other hand fori< j, jiis smaller than the number of points in the annulus{x :kPjkPkk ≤ kxPkk ≤ kPikPkk}, in particular, for somec

jickPikPkkd−1kPjkPikk.

Hence kPkj Pikk is at least c(ji)(ni)1/d−1 for some c. This implies that standard ϕ-mixing assumptions betweenσ(Xk,Xkj)andGik, rewrite

ϕ(Gik,σ(Xk,Xkj))≤ϕ∞,2((ji)(ni)1/d−1), ij (51)

for some decreasing functionϕ,2.

Equations (50) and (51) will imply, forijk, and any measurable bounded functions f andg

kE[f(Xk)|Gjk]k2kf(Xk)kϕ,1((nj)1/d)

kE[g(Xk,Xkj)|Gik]E[g(Xk,Xkj)]k2kg(Xk,Xkj)kϕ,2((ji)(ni)1/d−1).

The first equation leads to

kE[Xk|Gjk]k2kXkkϕ,1((nj)1/d) kE[XkXkj|G

k

i ]−E[XkXkj]k∞≤2mkE[Xk|Gjk]k∞≤4mkXkk∞ϕ∞,1((nj)1/d)

1This means the existence of a constantcsuch that ifk,l

(25)

withm=supikXik, and the second

kE[XkXkj|GikE[XkXkj]k2mkXkkϕ,2((ji)(ni)1/d−1).

Hence

kE[XkXkj|G k

i ]−E[XkXkj]k∞≤4mkXkk∞min(ϕ∞,2((ji)(nj)1/d−1),ϕ∞,1((nj)1/d)).

We get withm1=PkXik, and settingϕ(x) =ϕ([x])

qX ki

kXikkkE[Xk|Gik]k2m1m

Z ∞

0

ϕ,1(x1/d)d x=2d m1m

Z ∞

0

ϕ,1(x)xd−1d x.

The integral is essentially the quantity B(φ)of [6], and Equations (21) and (22) may be seen as improvements over (b)(i) and (ii) of Corollary 4 of[6]. We get for r

r≤3d m2m1

Z ∞

0

ϕ,1(y)yd−1d y+4m2m1

Z ∞

0

Z ∞

0

min(ϕ,2((yz)y1/d−1),ϕ∞,1(z1/d))1z<yd y dz

≤3d m2m1

Z ∞

0

ϕ,1(y)yd−1d y+4m2m1

Z ∞

0

Z ∞

0

ϕ,2((yz)y1/d−11y>2z+ϕ∞,1(z1/d)1z<y<2z)d y dz

C m2m1

Z ∞

0

(ϕ,1(y) +ϕ∞,2(y))y2d−1d y

for some constantC which depends only ond. The integral in the right hand side is essentially the

D(φ)of[6]page 86. If we refer to the independent case (Xkof order 1/pn) the factor m2m1is of order 1/pn, what makes the factor oft3 in (34) residual as far ast is smaller thanpn(moderate deviations).

The same estimates can be obtained forw.

4.2

β

-mixing

ϕ-mixing is not always a realistic assumption: for a Markov chain, ϕ-mixing implies typically a Doeblin condition; it is satisfied for ergodic finite state Markov chain, but on the other hand, a non-trivial Gaussian autoregressive process is notϕ-mixing.

β-mixing is a much more satisfactory measure of independence, cf [3]. The β-mixing constant between twoσ-fieldsAandBrepresent the total variation between the actual measure onABand

the independence (product of marginal measures). A non-singular autoregressive process, as most Markov chains, isβ-mixing with exponentially decreasing coefficients[3]. IfX isA-measurable and

β(A,B)is theβ-mixing constant of theseσ-fields, one has

kE[X|B]ks2β(A,B)1/qkXkp,

1

p+

1

q =

1

s

(elementary consequence of Theorem 1.4 (a) of[?]). This implies thatw1 as well asrs,s<∞, can

(26)

A

Technical inequalities

Proposition 12. The three pairs of functions

θ(x) =0, ψ(x) =exx1

θ(x) =ζ(x+), ψ(x) =ζ(x), ζ(x) =ex+x1

θ(x) = x

2

6, ψ(x) =

x2

3

satisfy

exθ(x)1+x+ψ(x). (52)

We have also

|tanh(xθ(x)log(1+ψ(y)))| ≤3m, |x|,|y| ≤m. (53)

Proof. Everything will be more or less based on the inequalityex 1+x.

Equation (52) is obvious for the first pair of functions. In the second case we have only to check for

x >0; since in this casexθ(x) =1ex this reduces to proving that:

1ex ≤log(1+x).

The function log(1+x) +ex −1 has a derivative(1+x)−1ex which is≥0 sinceex ≥1+x; hence the inequality is satisfied. The third case is the non negativity of the function

f(x) =1+x+ x

2

3 −e

xx2/6.

This function satisfies

f′(x) =1+2x

3 −(1−

x

3)e

xx2/6

f′′(x) = 2

3−((1−

x

3)

2 −1

3)e

xx2/6 = 2

3

1−(1x+x2/6)exx2/6

which is non negative since 1−(1u)eu ≥ 0; f is convex. Since f′(0) =0 and f(0) = 0, we conclude that f is non negative.

For the last inequality, we start with an upper bound onψ(y). In the first case

ψ(y)ψ(|y|)ψ(m)em1.

In the second caseψ(y)ψ(m)em1 and in the third caseψ(y)m2/3em1 (because of the expansion of the exponential); in any case we have

log(1+ψ(y))m. (54)

On the one hand

(27)

and on the other hand, thanks to (54), using thatθ(x)x2/2

tanh(log(1+ψ(y))x+θ(x)tanh(2m+m2/2)

≤min(2m+m2/2, 1)

≤3m.

Proposition 13. For any a≤0≤ b one has

eae b

−1

b ≤4 exp

‚

min ‚

b2

8 ,a+b ŒŒ

. (55)

Proof. We have indeed

exp ‚

b 2

8 Œ

eb1

b =exp

‚

b 2

8 Œ

(eb/2+1)(eb/21)

b

=eb2/8(eb/2+1)2tanh(b/4)

b

≤(eb/2b2/16+eb2/16)21 4

≤ 1

4(e+1)

2

<4.

Hence

eae b

−1

b ≤4 exp

‚

min ‚

b2

8 ŒŒ

.

And clearly

eabeae b

−1

b =

1eb b ≤1.

References

[1] B. Bercu, A. Touati, Exponential inequalities for self-normalized martingales with applications,

Ann. Appl. Probab.13 (2008), no. 5, 1848–1869. MR2462551

[2] S. Boucheron, G. Lugosi, P. Massart, Concentration inequalities using the entropy method,Ann. Probab.31 (2003), no. 3, 1583–1614. MR1989444

(28)

[4] J. Bretagnolle, A new large deviation inequality for U-statistics of order 2,ESAIM: Probability and Statistics3 (1999), p. 151-162. MR1742613

[5] O. Catoni, Laplace transform estimates and deviation inequalities. Ann. Inst. H. Poincaré Probab. Statist.39 (2003), no. 1, 1–26. MR1959840

[6] J. Dedecker, Exponential Inequalities and Functional central limit theorems for random fields,

ESAIM5 (2001), 77-104. MR1875665

[7] P. Doukhan,Mixing: Properties and Examples,Lecture Notes in Stat.85, 1994. MR1312160

[8] P. Doukhan, Paul, P. Massart, E. Rio, Emmanuel, Invariance principles for absolutely regu-lar empirical processes, Ann. Inst. Henri Poincaré, Probab. Stat. 31 (1995), no. 2, 393-427. MR1324814

[9] M. Fromont, B. Laurent, Adaptive goodness-of-fit tests in a density model. Ann. Statist. 34 (2006), no. 2, 680–720. MR2281881

[10] H-O. Georgii,Gibbs Measures and Phase Transitions, de Gruyter Studies in Mathematics 9, Wal-ter de GruyWal-ter & Co., 1988. MR0956646

[11] E. Giné, R. Latala, J. Zinn, Exponential and moment inequalities for U-statistics. High dimen-sional probability, II, 13–38, Progr. Probab., 47 (2000), Birkhauser Boston, Boston, MA, 2000. MR1857312

[12] P. Hall, C.C. Heyde, Martingale Limit Theory and its Applications, Academic Press, 1980. MR0624435 MR0624435

[13] A.J. Lee,U-Statistics : theory and practice, Statistics: Textbooks and Monographs, 110. M Mar-cel Dekker, 1990. MR1075417

Referensi

Dokumen terkait

penawaran yang masuk untuk paket pekerjaan tersebut diatas serta tahapan-tahapan lainnya sesuai Perpres Nomor 54 Tahun 2010, maka dengan ini dumumkan sebagai

Di balik ketatnya persaingan dan juga berbagai kendala yang harus dihadapi oleh RBTV sebagai stasiun televisi lokal, masih terdapat peluang bagi RBTV untuk dapat bertahan

data, dan entitas baik berupa masukan untuk sistem ataupun hasil dari

Dalam sandiwara ini tidak memakai naskah seperti pembuatan sebuah program radio lainnya, namun justru inilah yang menjadi ciri dan keunikan dari drama radio “Kos kosan Gayam.”

Local Area Network (WLAN) yang berada di Kantor Badan Geologi Bandung.. Pada tahapan ini dilakukan pengumpulan data dengan cara mengumpulkan.. materi-materi yang diperlukan dari

Hypothesis testing will be conducted to test whether profitability ratio, liquidity ratio, debt ratio, market ratio, and activity ratio simultaneously and

Apabila ada sanggahan atas pelelangan umum dengan pascakualifikasi tersebut di atas sesuai ketentuan Peraturan Presiden Nomor 54 Tahun 2010 pasal 81 ayat 2,

informasi untuk manajemen surat menyurat di Dinas Kesehatan Bandung akan. dilampirkan pada