Approximation in the ∞→ 2 norm - Analyses of Randomized Linear Algebra

and Var(X_jk) = ^a_p²^jk −a²_jk. Consequently, the first term in Corollary3.9satisfies

jVar(Xjk)1/2

1 p

a_k

2 2−

a_k

2 2

1/2

1−p p

1/2

kAkcol

1−p p

_1/2 np

and likewise the second term satisfies

kVar(X_jk)1/2

1−p p

1/2

np n

Therefore the relative∞→1 norm error satisfies

jVar(Xjk)1/2

kVar(Xjk)1/2

kAk_∞→1

1−p pn

1/2 .

It follows thatEkA−Xk_∞→1 < γ for p on the order of(1+nγ²)⁻¹ or larger. The expected number of nonzero entries inXispn², so for matrices with this structure, we can sparsify with a relative∞→1 norm error smaller thanγwhile reducing the number of expected nonzero entries to as few as O(₁_+nγⁿ² 2) =O(_γⁿ2). Intuitively, this sparsification result is optimal in the dimension:

it seems we must keep on average at least one entry per row and column if we are to faithfully approximateA.

Theorem 3.12. Suppose thatZis a random matrix with independent, zero-mean entries. Then

EkZk∞→2≤2EkZkF+2 min

D E ZD⁻¹

2→∞

whereDis a positive diagonal matrix that satisfiesTr(D²) =1.

Proof. Apply Theorem3.5to get

EkZk∞→2≤2E

k"kZ_(k)

2+2 max

kuk2=1E X

j"jZ_jku_j

=:F+S.

(3.5.1)

Expand the first term, and use Jensen’s inequality to move the expectation with respect to the Rademacher variables inside the square root:

F=2E X

k"_kZ_jk

21/2

≤2EZ

jE_"

k"_kZ_jk

21/2

The independence of the Rademacher variables implies that the cross terms cancel, so

F≤2E

kZ²_jk

_1/2

=2EkZkF.

We use the Cauchy–Schwarz inequality to replace the`1 norm with an`2norm in the second term of (3.5.1). A direct application would introduce a possibly suboptimal factor ofp

n(where nis the number of columns inZ), so instead we choosed_k>0 such thatP

kd_k²=1 and use the corresponding weighted`2 norm:

S=2 max

kuk2=1E X

j"jZ_jku_j

d_k d_k≤2 max

kuk2=1E





 X

j"jZ_jku_j

d_k²







1/2

Move the expectation with respect to the Rademacher variables inside the square root and observe that the cross terms cancel:

S≤2 max

kuk2=1EZ





 X

E_"

j"jZ_jku_j

d_k²







1/2

=2 max

kuk2=1E X

j,k

Z²_jku²_j d_k²

!1/2

Use Jensen’s inequality to pass the maximum through the expectation, and note that ifkuk2=1 then the vector formed by elementwise squaringulies on the`1 unit ball, thus

S≤2E max

kuk1=1

ju_j· X

(Zjk/dk)²

!!1/2

Clearly this maximum is achieved whenuis chosen sou_j=1 at an indexjfor whichP

k(Z_jk/d_k)² is maximal andu_j =0 otherwise. Consequently, the maximum is the largest of the`2 norms of the rows ofZD⁻¹, where D =diag(d1, . . . ,d_n). Recall that this quantity is, by definition, kZD⁻¹k2→∞. ThereforeS≤2EkZD⁻¹k2→∞. The theorem follows by optimizing our choice ofD and introducing our estimates forF andS into (3.5.1).

TakingZ=A−Xin Theorem3.12, we have

EkA−Xk_∞→2≤2E

j,k(Xjk−a_jk)²1/2

+2 min

D Emax

(Xjk−a_jk)² d_k²

1/2

. (3.5.2)

We now derive a bound which depends only on the variances of theX_jk.

Corollary 3.13. Fix the m×n matrixAand letXbe a random matrix with independent entries so thatEX=A. Then

EkA−Xk∞→2≤2X

j,kVar(X_jk)1/2

+2p mmin

D max

Var(Xjk) d_k²

1/2

whereDis a positive diagonal matrix withTr(D²) =1.

Proof. LetF andSdenote, respectively, the first and second term of (3.5.2). An application of Jensen’s inequality shows thatF≤2P

j,kVar(Xjk)1/2

. A second application shows that

S≤2 min

Emax

(X_jk−a_jk)² d²_k

1/2

Bound the maximum with a sum:

S≤2 min

jE X

(X_jk−a_jk)² d_k²

1/2

The sum is controlled by a multiple of its largest term, so

S≤2p mmin

max

Var(X_jk) d_k²

1/2

where mis the number of rows ofA.

3.5.1 Optimality

We now show that Theorem3.12gives an optimal bound, in the sense that each of its terms is necessary. In the following, we reserve the letter D for a positive diagonal matrix with Tr(D²) =1.

First, we establish the necessity of the Frobenius term by identifying a class of random matrices whose∞→2 norms are larger than their weighted 2→∞norms but comparable to their Frobenius norms. LetZbe a randomm×p

mmatrix such that the entries in the first column ofZ are equally likely to be positive or negative ones, and all other entries are zero. With this choice, EkZk∞→2=EkZkF=p

m. Meanwhile,EkZD⁻¹k2→∞=d₁₁⁻¹, so min_DEkZD⁻¹k2→∞=1, which

is much smaller thanEkZk_∞→2. Clearly, the Frobenius term is necessary.

Similarly, to establish the necessity of the weighted 2→∞norm term, we consider a class of matrices whose∞→2 norms are larger than their Frobenius norms but comparable to their weighted 2→∞norms. Consider ap

n×nmatrixZwhose entries are all equally likely to be positive or negative ones. It is a simple task to confirm thatEkZk_∞→2≥nandEkZkF=n³^/⁴; it follows that the weighted 2→∞norm term is necessary. In fact,

minD E ZD⁻¹

2→∞=min

D E max

j=1,...,p n

Xn k=1

Z²_jk d_kk²

!1/2

=min

k=1

1 d_kk²

1/2

=n,

so we see thatEkZk_∞→2 and the weighted 2→∞norm term are comparable.

3.5.2 An example application

From Theorem3.12we infer that a good scheme for sparsifying a matrixAwhile minimizing the expected relative∞→2 norm error is one which drastically increases the sparsity ofXwhile keeping the relative error

EkZkF+min_DE ZD⁻¹

2→∞

kAk_∞→2

small, whereZ=A−X.

As before, consider the case whereAis ann×nmatrix all of whose entries are positive and in an interval bounded away from zero. Letγbe a desired bound on the expected relative∞→2 norm error. We choose the randomization strategyX_jk∼ ^a_p^jk Bern(p)and ask how much can we sparsify while respecting our bound on the relative error. That is, how small canpbe? We appeal to Theorem3.12. In this case,

kAk∞→2=X

ka²_jk+2X

`<ma_j`a_jm

¹₂

n²+n²(n−1)¹₂

By Jensen’s inequality,

EkZkF≤EkAkF+EkXkF≤

1+ 1

kAkF=O

1+ 1

We bound the other term in the numerator, also using Jensen’s inequality:

minD E ZD⁻¹

2→∞≤p

nEkZk2→∞≤p n

1+ 1

kAk2→∞=O

1+ 1

to get

EkZkF+min_DE ZD⁻¹

2→∞

kAk_∞→2

1 pn+ 1

ppn

1 ppn

We conclude that, for this class of matrices and this family of sparsification schemes, we can reduce the number of expected nonzero terms to O

n γ²

while maintaining an expected∞→2 norm relative error ofγ.

Dalam dokumen Analyses of Randomized Linear Algebra (Halaman 88-93)