Error bounds for Nyström extensions - Analyses of Randomized Linear Algebra

In this section, we use the structural results from Section6.4to bound the approximation errors of Nyström extensions. To obtain our results, we use Theorems6.2,6.6, and6.7in conjunction with the estimate ofkΩ^†₁k²2 provided by the following lemma.

Lemma 6.8. LetUbe an n×k matrix with orthonormal columns. Takeµto be the coherence ofU.

Selectε∈(0, 1)and a nonzero failure probabilityδ.LetSbe a random matrix distributed as the first`columns of a uniformly random permutation matrix of size n,where

`≥ 2µ

(1−ε)²klogk δ.

Then with probability exceeding1−δ, the matrixU^TShas full row rank and satisfies

(U^TS)^†

2 2≤ n

εl.

Proof of Lemma6.8. Note thatU^TShas full row rank ifλ_k(U^TSS^TU)>0. Furthermore,

(U^TS)^†

2=λ⁻¹_k (U^TSS^TU).

Thus to obtain both conclusions of the lemma, it is sufficient to verify that

λk(U^TSS^TU)≥ ε`

when`is as stated.

We apply the lower Chernoff bound given as (4.1.1) to bound the probability that this inequality is not satisfied. Letu_i denote theith column ofU^T andX ={u_iu_i^T}i=1,...,n. Then

λ_k(U^TSS^TU) =λ_kX_`

i=1X_i

where theX_i are chosen uniformly at random, without replacement, fromX. Clearly

B=max

i ku_ik²= k

nµ and µmin=`·λ_k(EX₁) = `

nλ_k(U^TU) = ` n.

The Chernoff bound yields

λk

U^TSS^TU

≤ε` n

≤k·e⁻⁽¹^−ε)²^`/(^2kµ).

We require enough samples that

λk(U^TSS^TU)≥ε` n

with probability greater than 1−δ, so we set

k·e⁻⁽¹^−ε)²^`/(^2kµ)≤δ

and solve for`, finding

`≥ 2µ

(1−ε)²klogk δ.

Thus, for values of`satisfying this inequality, we achieve the stated spectral error bound and ensure thatU^TShas full row rank.

Theorem6.9establishes that the error incurred by the simple Nyström extension process is small when an appropriate number of columns is sampled. Specifically, if the number of columns sampled is proportional to the coherence of the top eigenspace of the matrix, and grows with the desired target rank asklogk, then the bounds provided in the theorem hold.

Note that the theorem provides two spectral norm error bounds. The first is a relative- error bound which compares the spectral norm error with the smallest error achievable when approximatingAwith a rank-kmatrix. It does not use any information about the spectrum of Aother than the value of the(k+1)st eigenvalue. The second bound uses information about the entire tail of the spectrum ofA. If the spectrum ofAdecays fast, the second bound is much tighter than the first. If the spectrum ofAis flat, then the first bound is tighter.

Theorem 6.9. LetAbe an SPSD matrix of size n,and p≥1be an integer. Given an integer k≤n, partitionAas in(6.2.1). Letµdenote the coherence ofU₁.Fix a failure probabilityδ∈(0, 1)and accuracy factorε∈(0, 1).IfScomprises

`≥2µε⁻²klog(k/δ)

columns of the identity matrix, sampled uniformly at random without replacement, andC=A^pS andW=S^TA^2p−¹S,then the corresponding SPSD sketch satisfies

A−CW^†C^T 2≤

n (1−ε)`

1/(2p−1)

A−A_k 2,

A−CW^†C^T 2≤

A−A_k 2+

δ²(1−ε)Tr

(A−A_k)^2p−1

1/(2p−1)

A−CW^†C^T F≤

A−A_k

F+ γ^p−¹ δ

r 2

1−ε+ γ^2p−² δ²(1−ε)

Tr A−A_k , and

A−CW^†C^T

≤

1+ γ^2p−² δ²(1−ε)

Tr A−A_k ,

simultaneously, with probability at least1−4δ. If, additionally, k≥rank(A),then

A=CW^†C^T

with probability exceeding1−δ.

Remark 6.10. Theorem 6.9, like the main result of [TR10], promises exact recovery with probability at least 1−δwhenAis exactly rankkand has small coherence, with a sample of O(klog(k/δ))columns. Unlike the result in[TR10], Theorem6.9is applicable in the case that Ais full-rank but has a sufficiently fastly decaying spectrum.

Remark6.11. Letp=1. The first spectral-norm error bound in Theorem6.9simplifies to

A−CW^†C^T 2≤

1+ n

(1−ε)`

A−A_k 2.

The multiplicative factor in this relative error bound is optimal in terms of its dependence onnand`. This fact follows from the connection between Nyström extensions and the column subset selection problem.

Indeed,[BDMI11]establishes the following lower bound for the column subset selection problem: for anyα >0 there exist matricesM_αsuch that for anyk≥1 and any`≥1, the error of approximatingM_αwithP_DM_α, whereDmay beanysubset of`columns ofM_α, satisfies

M_α−P_DM_α 2≥

rn+α²

`+α² ·

M_α−(M_α)k

where (M_α)_k is the rank-k matrix that best approximatesM_αin the spectral norm. We get a lower bound on the error of the Nyström extension by takingA_α=M_α^TM_α: it follows from the remark following Theorem6.2 that for anyk≥1 and`≥1, any Nyström extension formed

usingC=A_αSconsisting of`columns sampled fromA_αsatisfies

A_α−CW^†C^T

2≥ n+α²

`+α² ·λ_k+1(A_α).

Proof of Theorem6.9. The sketching matrixSis formed by taking the first`columns of a uniformly sampled random permutation matrix. Recall thatΩ1=U^T₁SandΩ2=U₂^TS. By Lemma6.8, Ω1 has full row-rank with probability at least 1−δ, so the bounds in Theorems6.2,6.6, and6.7 are applicable. In particular, ifk>rank(A)thenA=CW^†C^T.

First we use Theorems6.2and6.7to develop the first spectral-norm error bound and the trace-norm error bound. To apply these theorems, we need estimates of the quantities

Σ₂^p⁻¹^/²Ω2Ω^†₁

2 and

Σ¹₂^/²Ω2Ω^†₁ F.

Applying Lemma6.8, we see thatkΩ^†₁k²₂≤n/((1−ε)`)with probability exceeding 1−δ. Observe that

Ω2

2≤

U₂

2kSk2≤1, so

Σ₂^p−1/2Ω2Ω^†₁

2 2≤

Σ^p−1/2₂

2 2

Ω^†₁

2≤ n

(1−ε)`

Σ2

2p−1

2 . (6.5.1)

Likewise,

Σ¹₂^/²Ω2Ω^†₁ F≤

Σ¹₂^/²Ω2

Ω^†

2≤

Ç n (1−ε)`

Σ¹₂^/²Ω2

F. (6.5.2)

These estimates hold simultaneously with probability at least 1−δ.

To further develop (6.5.2), observe that sinceSselects`columns uniformly at random,

E Σ¹₂^/²Ω2

2 F=E

Σ¹₂^/²U₂^TS

2 F=X_`

i=1Ekx_ik²,

where the summandsx_i are distributed uniformly at random over the columns ofΣ¹₂^/²U₂^T. The summands all have the same expectation:

Ekx_ik²= 1 n

j=1k(Σ2U^T₂)₍j)k²= 1 n

Σ¹₂^/²U₂

2 F= 1

n Σ¹₂^/²

F=Tr Σ2

Consequently,

E Σ¹₂^/²Ω2

2 F= `

nTr Σ2

so by Jensen’s inequality

E Σ¹₂^/²Ω2

F≤

Σ¹₂^/²Ω2

2 F

1/2

= r`

nTr Σ2

Now applying Markov’s inequality to (6.5.2), we see that

Σ¹₂^/²Ω2Ω^†₁ F≤ 1

δE

Σ¹₂^/²Ω2Ω^†₁ F≤ 1

δ r 1

(1−ε)Tr Σ2

with probability at least 1−2δ. Clearly one can similarly show that

Σ^p−1/2₂ Ω2Ω^†₁ F≤ 1

δ r 1

(1−ε)Tr

Σ^2p−1₂

(6.5.3)

with probability at least 1−2δ.

Thus far, we have established thatΩ1has full row-rank and the estimates

Σ₂^p−¹^/²Ω2Ω^†₁

2≤ n

(1−ε)`

Σ2

2p−1

2 and (6.5.4)

Σ^1/2₂ Ω2Ω^†₁ F≤ 1

δ r 1

(1−ε)Tr Σ2

(6.5.5)

hold simultaneously with probability at least 1−2δ.

These estimates used in Theorems6.2and6.7yield the trace-norm error bound and the first spectral-norm error bound.

To develop the second spectral-norm error bound, observe that by (6.5.3), we also have the estimate

Σ₂^p−¹^/²Ω2Ω^†₁

2 2≤

Σ^p−₂ ¹^/²Ω2Ω^†₁

F≤ 1

δ²(1−ε)Tr

Σ^2p−₂ ¹

which holds with probability at least 1−2δ. This estimate used in Theorem6.2yields the second spectral-norm bound.

To develop the Frobenius-norm bound, observe that Theorem6.6can be weakened to the assertion that, whenΩ1 has full row-rank, the estimate

A−CW^†C^T F≤

Σ2

F+γ^p−1

Σ¹₂^/²Ω2Ω^†₁ F

p2 Tr Σ2

+γ^2p−2

Σ¹₂^/²Ω2Ω^†₁

F (6.5.6)

holds. Recall that with probability at least 1−2δ, the estimate (6.5.5) holds andΩ1 has full row-rank. Insert the estimate (6.5.5) into (6.5.6) to establish the claimed Frobenius-norm error bound.

Dalam dokumen Analyses of Randomized Linear Algebra (Halaman 188-194)