• Tidak ada hasil yang ditemukan

getdocbbe2. 135KB Jun 04 2011 12:04:54 AM

N/A
N/A
Protected

Academic year: 2017

Membagikan "getdocbbe2. 135KB Jun 04 2011 12:04:54 AM"

Copied!
10
0
0

Teks penuh

(1)

ELECTRONIC COMMUNICATIONS in PROBABILITY

QUANTITATIVE ASYMPTOTICS OF GRAPHICAL

PROJECTION PURSUIT

ELIZABETH S. MECKES1

220 Yost Hall, Department of Mathematics, Case Western Reserve University, 10900 Euclid Ave., Cleveland, OH 44106.

email: ese3@cwru.edu

SubmittedJanuary 12, 2009, accepted in final formApril 14, 2009 AMS 2000 Subject classification: 60E15,62E20

Keywords: Projection pursuit, concentration inequalities, Stein’s method, Lipschitz distance

Abstract

There is a result of Diaconis and Freedman which says that, in a limiting sense, for large collections of high-dimensional data most one-dimensional projections of the data are approximately Gaus-sian. This paper gives quantitative versions of that result. For a set of deterministic vectors{xi}n

i=1 inRd withnanddfixed, letθ Sd−1be a random point of the sphere and letµθ

n denote the ran-dom measure which puts mass 1n at each of the points

x1,θ

, . . . ,

xn,θ

. For a fixed bounded Lipschitz test function f, Z a standard Gaussian random variable andσ2a suitable constant, an explicit bound is derived for the quantity P

– Z

f dµθ

n−Ef(σZ)

> ε

™

. A bound is also given

forP”dB L(µθ

n,N(0,σ

2))> ε—

, wheredB Ldenotes the bounded-Lipschitz distance, which yields a lower bound on the waiting time to finding a non-Gaussian projection of the{xi}if directions are tried independently and uniformly onSd−1.

1

Introduction

A foundational tool of data analysis is the projection of high-dimensional data to a one- or two-dimensional subspace in order to visually represent the data, and, ideally, identify underlying structure. The question immediately arises: which projections are interesting? One would like to answer by saying that those projections which exhibit structure are interesting, however, identify-ing which projections those are is not quite as straightforward as one might think. In particular, there are several reasons that have led to the idea that one should mainly look for projections which are far from Gaussian in behavior; that Gaussian projections in fact do not generally exhibit interesting structure. One justification for this idea is the following result due to Persi Diaconis and David Freedman.

1RESEARCH SUPPORTED BY AN AMERICAN INSTITUTE OF MATHEMATICS FIVE-YEAR FELLOWSHIP

(2)

Theorem 1(Diaconis-Freedman[1]). Let x1, . . . ,xnbe deterministic vectors inRd. Suppose that n,

d and the xi depend on a hidden indexν, so that asνtends to infinity, so do n and d. Suppose that there is aσ2>0such that, for allε >0,

1 n

¦

jn:|xj|2−σ2d > εd

©

ν→∞

−−→0, (1)

and suppose that

1 n2

¦

j,kn: ¬

xj,xk

> εd

©

ν→∞

−−→0. (2)

LetθSd−1be distributed uniformly on the sphere, and consider the random measureµθ

ν which puts

mass 1n at each of the points

θ,x1

, . . . ,

θ,xn

. Then asνtends to infinity, the measuresµθ ν tend

toN(0,σ2)weakly in probability.

Heuristically, Theorem 1 can be interpreted as saying that, for a large number of high-dimensional data vectors, as long as they have nearly the same lengths and are nearly orthogonal, most one-dimensional projections are close to Gaussian regardless of the structure of the data. It is important to note that the conditions (1) and (2) are not too strong; in particular, even though only d vectors can be exactly orthogonal inRd, the 2dvertices of a unit cube centered at the origin satisfy

condition (2) for “rough orthogonality”.

A failing of the usual interpretation of Theorem 1 is that sometimes, projections of data look nearly Gaussian for a reason; that is, it is not always due to the central-limit type effect described by the theorem. Thus the question arises: is there a way to tell whether a Gaussian projection is interesting? A possible answer lies in quantifying the theorem, and then saying that a nearly-Gaussian projection is interesting if it is “too close” to nearly-Gaussian to simply be the result of the phenomenon described by Theorem 1. By way of analogy, one has the Berry-Esséen theorem stating that the rate of convergence to normal of the sum ofnindependent, identically distributed random variables is of the orderp1n; if one has a sum ofnrandom variables converging to Gaussian significantly faster, it must be happening for some reason other than just the usual central-limit theorem. In order to implement this idea, it is necessary (as with the Berry-Esséen theorem) to have a sharp quantitative version of the limit theorem in question.

A second motivation for proving a quantitative version of Theorem 1 is the application to waiting times for discovering an interesting direction on which to project data. If a sequence of indepen-dent random projection directions is tried until the empirical distribution of the projected data is more than some threshhold away from Gaussian (in some metric on measures), and N is the number of trials needed to find such a direction, a one can easily give a lower bound forEN from

the type of quantitative theorem proved below.

Thus the goal of this paper is to provide a quantitative version of Theorem 1 in a fixed dimension d and for a fixed number of data vectorsn. To do this, it is first necessary to replace conditions (1) and (2) with non-asymptotic conditions. The conditions we will use are the following. Letσ2 be defined by 1

n

Pn

i=1|xi|2=σ2d. Suppose there existAandB, such that 1

n n

X

i=1

σ−2|xi|2−d

A, (3)

and, for allθSd−1,

1 n

n

X

i=1

θ,xi

2

(3)

For a little perspective on the restrictiveness of these conditions, note that, as for the conditions of Diaconis and Freedman, they hold for the vertices of a unit cube inRd (withA=0 andB= 1

4). Under these assumptions, the following theorems hold.

Theorem 2. Let{xi}ni=1be deterministic vectors inR

d, subject to conditions(3)and(4)above. For a pointθSd−1, let the measureµθ

n put equal mass at each of the points

random variable,θchosen uniformly on the sphere,σdefined as above, andε >maxp2πpB

d−1,

d, subject to conditions(3)and(4)above, and again consider the measuresµθ

n. Ifθis chosen uniformly fromS

(i) It should be emphasized that the key difference between the results proved here and the result of Diaconis and Freedman is that Theorems 2 and 3 hold forfixeddimensiond and number of data vectorsn; there are no limits in the statements of the theorems.

(ii) It is not necessary for Aand B to be absolute constants; for the the results above to be of interest asd → ∞, it is easy to see from the statements that it is only necessary that A=o(d)andB = o(d)for Theorem 2 whileB = o(pd)for Theorem 3. The reader may also be wondering where the dependence onnis in the statements above; it is built into the definition ofB. Note that, by definition,B≥|xi|2

n for eachi; in particular,B

σ2d

n . It is thus necessary thatn→ ∞asd→ ∞for Theorem 2 andnpdfor Theorem 3.

(iii) For Theorem 2, consider the special case thatε2= C2

·25B

d or smaller.

(4)

whereC=9·216C B2andC′′=48pπC−3/10. Thus, roughly speaking, the bounded Lips-chitz distance from the random measureµθ

n to the Gaussian measure with mean zero and varianceσ2is unlikely to be more than a large multiple oflog(d−1)

d−1

1/5

. We make no claims of the sharpness of this result.

Theorem 3 can easily be used to give an estimate on the waiting time until a non-Gaussian di-rection is found, if didi-rections are tried randomly and independently. Specifically, we have the following corollary.

Corollary 4. Letθ1,θ2,θ3, . . .be a sequence of independent, uniformly distributed random points on

Sd−1. Let Tε:=min{j:dB L(µθnj,N(0,σ2)> ε}.Then there are constants c,csuch that

ETε

3/2

p

B exp

‚

c(d1)ε5 B2

Œ

.

2

Proofs

This section is mainly devoted to the proofs of Theorems 2 and 3, with some additional remarks following the proofs. For the proof of Theorem 2, several auxiliary results are needed. The first is an abstract normal approximation for bounding the distance of a random variable to a Gaussian random variable in the presence of a continuous family of exchangeable pairs. The theorem is an abstraction of an idea used by Stein in [6] to bound the distance to Gaussian of the trace of a power of a random orthogonal matrix.

Theorem 5 (Meckes [4]). Suppose that (W,Wε) is a family of exchangeable pairs defined on a common probability space, such thatEW =0andEW2=σ2. LetF be aσ-algebra on this space

with σ(W) ⊆ F. Suppose there is a function λ(ε)and random variables E,Emeasurable with respect toF, such that

(i) 1

λ(ε)E ”

W

F

— L1

−−→ε

→0 −W+E.

(ii) 1 2λ(ε)σ2E

”

(W)2

F

— L1

−−→ε

→0 1+E. (iii) λ1(ε)E|W

εW|3 ε→0

−−→0.

Then if Z is a standard normal random variable,

dT V(W,σZ)EE +

Ç

π 2E

E

.

The next result gives expressions for some mixed moments of entries of a Haar-distributed orthog-onal matrix. See[3], Lemma 3.3 and Theorem 1.6 for a detailed proof.

Lemma 6. If Uui j

—d

i,j=1is an orthogonal matrix distributed according to Haar measure, then

E hQ

uri j

i j

i

is non-zero if and only if ri•:=

Pd

j=1ri j and rj:=

Pd

(5)

(i) For all i,j,

E h

u2i ji=1

d. (ii) For all i,j,r,s,α,β,λ,µ,

E

ui jursuαβuλµ

= 1

(d−1)d(d+2)

h

δi rδαλδjβδsµ+δi rδαλδjµδsβ+δiαδrλδjsδβµ

+δiαδrλδjµδβs+δiλδrαδjsδβµ+δiλδrαδjβδsµi

+ d+1

(d1)d(d+2)

h

δi rδαλδjsδβµ+δiαδrλδjβδsµ+δiλδrαδjµδsβi.

(iii) For the matrix Q=

qi j

d

i,j=1defined by qi j:=ui1uj2ui2uj1,and for all i,j,ℓ,p,

qi jqp—= 2

d(d1)

δiℓδj pδi pδjℓ

.

Finally, we will need to make use of the concentration of measure on the sphere, in the form of the following lemma.

Lemma 7 (Lévy, see [5]). For a function F :Sd−1 R, let M

F denote its median with respect to the uniform measure (that is, for θ distributed uniformly on Sd−1, PF(θ) M

F

≥ 12 and

P

F(θ)MF

≥ 12) and let L denote its Lipschitz constant. Then

F(θ)−MF > ε

— ≤

Ç

π 2exp

–

−(d−1)ε

2

2L2

™

.

With these results, it is now possible to give the proof of Theorem 2.

Proof of Theorem 2. The proof divides into two parts. First, an “annealed” version of the theorem is proved using the infinitesimal version of Stein’s method given by Theorem 5. Then, for a fixed test

function f andZa standard Gaussian random variable, the quantityP •

R

f dµθ

ν −Ef(σZ)

> ε

˜

is bounded using the annealed theorem together with the concentration of measure phenomenon.

Letθ be a uniformly distributed random point ofSd−1Rd, and let I be a uniformly distributed

element of{1, . . . ,n}, independent ofθ. Consider the random variableW:=

θ,xI

. ThenEW=

0 by symmetry andEW2=σ2by the condition1

n

Pn

i=1|xi|2=σ2d. Theorem 5 will be used to bound the total variation distance fromWtoσZ, whereZis a standard Gaussian random variable. The family of exchangeable pairs needed to apply the theorem is constructed as follows. Forε >0 fixed, let

:=

 p

1ε2 ε

ε p1ε2

⊕Id2 = Id+ 

−

ε2

2 +δ ε

ε ε2 2 +δ

⊕0d2,

whereδ=O(ε4). LetUbe a Haar-distributedd×drandom orthogonal matrix, independent ofθ andI, and let=

UAεUTθ,xI

(6)

LetK be the d×2 matrix made of the first two columns of UandC2=

KC2KT (note that this is the sameQas in part (iii) of Theorem 6). Then by the construction of ,

The conditions of Theorem 5 can be verified using the expressions in Lemma 6 as follows. By the lemma,EK KT=2

Condition (ii) of Theorem 5 is thus satisfied withE= 1

d−1

. Condition (iii) of the theorem is trivial by (5); it follows that

dT V(W,σZ)≤ This is the annealed statement referred to at the beginning of the proof.

We next use the concentration of measure on the sphere to show that, for a large measure of θSd−1, the random measureµθ

nwhich puts mass 1

nat each of the

θ,xi

is close to the average behavior. To do this, we make use of Lévy’s Lemma (Lemma 7). Let f :R R be such that

(7)

Sd−1. Then, usingkfk

B L≤1 together with equation (4),

thus the Lipschitz constant ofF is bounded bypB. It follows from Lemma 7 that

F(θ)−MF

d−1, we may use concentration about the median ofF to obtain concentration about the mean, with only a loss in constants.

(8)

Proof of Theorem 3. The first two steps of the proof of Theorem 3 were essentially done already in the proof of Theorem 2. From that proof, we have that ifW=

θ,xI

forθdistributed uniformly onSd−1andI independent ofθ and uniformly distributed in{1, . . . ,n}, then

dT V(W,σZ) A+2

d−1, (8)

forAas in equation (3). Furthermore, it follows from equation (7) in the proof of Theorem 2 that forF(θ):=R f dµθnandε > pπpB

d−1, then

P[|F(θ)EF(θ)|> ε]F(θ)−MF > ε

Mf −EF(θ) —

≤P –

F(θ)−MF > ε

πpB 2pd−1

™

≤ Ç

π 2e

−(d8B−1)ε 2

.

(9)

In this proof, this last statement is used together with a series of successive approximations of arbitrary bounded Lipschitz functions as used by Guionnet and Zeitouni[2]to obtain a bound for

d

B L(µθn,N(0,σ

2))> ε—

. By definition,

dB L(µθ

n,Eµ

θ

n)> ε

—

=P

–

sup kfkB L≤1

Z

f dµθnE Z

f dµθn

> ε

™

.

First consider the subclassFB L,K ={f :kfkB L 1,supp(f)K}for a compact set KR. Let

∆ =ε

4; for f ∈ FB L,K, define the approximationf∆as in Guionnet and Zeitouni[2]as follows. Let xo=infKand let

g(x) =

0 x0; x 0≤x≤∆;

x∆.

ForxK, define frecursively by f(xo) =0 and

f∆(x) =

x−∆xo

X

i=0

2If(x

o+ (i+1)∆)≥f∆(xo+i∆)

−1g(xxoi∆).

That is, the function f∆ is just an approximation off by a function which is piecewise linear and

has slope 1 or−1 on each of the intervals[xo+i∆,xo+ (i+1)∆]. Note that, becausekfkB L≤1, it follows thatkff∆k∞≤∆and the number of distinct functions whose linear span is used to approximatef in this way is bounded by|K|

∆, where|K|is the diameter ofK. If{hk}

N

(9)

set of functions used in the approximation f∆andεktheir coefficients, then forε2>8π|K|

(10)

assuming thatBε. Recall thatER f dµθ

n=Ef(W)forW=

θ,xI

, and so by the bound (8),

sup f∈FB L

E

Z

f dµθnEf(σZ)

A+2 d1, thus forεbounded below as above and also satisfyingε > 2(A+2)

d−1 ,

P

dB L(W,σZ)> ε

≤ 48

p

πB ε3/2 exp

–

−(d−1)ε

5

9·216B2

™

.

Proof of Corollary 4. The proof is essentially trivial. Note that

P[Tε>m] –

1c1

p

B ε3/2 exp

‚

c2(d−1)ε5 B2

Ϊm

by independence of theθjand Theorem 3, sinceTε>mif and only if dB L(µ

θj

n,N(0,σ2)≤εfor all 1≤jm. This bound can be used in the identityET

ε=

P∞

m=0P[Tε>m]to obtain the bound

in the corollary.

Remark:One of the features of the proofs given above is that they can be generalized to the case ofk-dimensional projections of thed-dimensional data vectors{xi}, withkfixed or even growing with d. The proof of the higher-dimensional analog of Theorem 2 goes through essentially the same way. However, the analog of the proof of Theorem 3 from Theorem 2 is rather more involved in the multivariate setting and will be the subject of a future paper.

References

[1] Persi Diaconis and David Freedman. Asymptotics of graphical projection pursuit.Ann. Statist., 12(3):793–815, 1984. MR0751274

[2] A. Guionnet and O. Zeitouni. Concentration of the spectral measure for large matrices. Elec-tron. Comm. Probab., 5:119–136 (electronic), 2000. MR1781846

[3] Elizabeth Meckes. An infinitesimal version of Stein’s method of exchangeable pairs. Doctoral dissertation, Stanford University, 2006.

[4] Elizabeth Meckes. Linear functions on the classical matrix groups. Trans. Amer. Math. Soc., 360(10):5355–5366, 2008. MR2415077

[5] Vitali D. Milman and Gideon Schechtman. Asymptotic theory of finite-dimensional normed spaces, volume 1200 ofLecture Notes in Mathematics. Springer-Verlag, Berlin, 1986. With an appendix by M. Gromov. MR0856576

Referensi

Dokumen terkait

Recently, understanding the important position of the province as the representative of the national government, under the new law on local gov- ernment (Law No.23/2014), the

Pengujian perfor- mansinya meliputi pembangkitan panas tungku biomassa dari bahan bakar tongkol jagung, analisis pindah panas dan eisiensi serta efektivitas

Hasil penelitian menunjukkan bahwa aktivitas antioksidan pigmen Oscillatoria stabil pada pH 7 dan suhu 28 ° c, namun aktivitasnya cenderung menurun saat pH buffer medium

Sehubungan dengan tahapan proses Seleksi Umum Kegiatan Penyusunan Lanjutan Penyempurnaan Rencana Zonasi Wilayah Pesisir dan Pulau-pulau Kecil (RZWP3K) Kota Makassar Tahun 2014 yang

Membawa dokumen asli dan 1 (satu) Set Fotocopy dari data-data formulir isian kualifikasi yang diinput di dalam Sistem Pengadaan Secara Elektronik (SPSE) pada

data, dan entitas baik berupa masukan untuk sistem ataupun hasil dari

[r]

[r]