In this section, we use the structural results from Section6.4to bound the approximation errors of Nyström extensions. To obtain our results, we use Theorems6.2,6.6, and6.7in conjunction with the estimate ofkΩ†1k22 provided by the following lemma.
Lemma 6.8. LetUbe an n×k matrix with orthonormal columns. Takeµto be the coherence ofU.
Selectε∈(0, 1)and a nonzero failure probabilityδ.LetSbe a random matrix distributed as the first`columns of a uniformly random permutation matrix of size n,where
`≥ 2µ
(1−ε)2klogk δ.
Then with probability exceeding1−δ, the matrixUTShas full row rank and satisfies
(UTS)†
2 2≤ n
εl.
Proof of Lemma6.8. Note thatUTShas full row rank ifλk(UTSSTU)>0. Furthermore,
(UTS)†
2
2=λ−1k (UTSSTU).
Thus to obtain both conclusions of the lemma, it is sufficient to verify that
λk(UTSSTU)≥ ε`
n
when`is as stated.
We apply the lower Chernoff bound given as (4.1.1) to bound the probability that this inequality is not satisfied. Letui denote theith column ofUT andX ={uiuiT}i=1,...,n. Then
λk(UTSSTU) =λkX`
i=1Xi
,
where theXi are chosen uniformly at random, without replacement, fromX. Clearly
B=max
i kuik2= k
nµ and µmin=`·λk(EX1) = `
nλk(UTU) = ` n.
The Chernoff bound yields
P
λk
UTSSTU
≤ε` n
≤k·e−(1−ε)2`/(2kµ).
We require enough samples that
λk(UTSSTU)≥ε` n
with probability greater than 1−δ, so we set
k·e−(1−ε)2`/(2kµ)≤δ
and solve for`, finding
`≥ 2µ
(1−ε)2klogk δ.
Thus, for values of`satisfying this inequality, we achieve the stated spectral error bound and ensure thatUTShas full row rank.
Theorem6.9establishes that the error incurred by the simple Nyström extension process is small when an appropriate number of columns is sampled. Specifically, if the number of columns sampled is proportional to the coherence of the top eigenspace of the matrix, and grows with the desired target rank asklogk, then the bounds provided in the theorem hold.
Note that the theorem provides two spectral norm error bounds. The first is a relative- error bound which compares the spectral norm error with the smallest error achievable when approximatingAwith a rank-kmatrix. It does not use any information about the spectrum of Aother than the value of the(k+1)st eigenvalue. The second bound uses information about the entire tail of the spectrum ofA. If the spectrum ofAdecays fast, the second bound is much tighter than the first. If the spectrum ofAis flat, then the first bound is tighter.
Theorem 6.9. LetAbe an SPSD matrix of size n,and p≥1be an integer. Given an integer k≤n, partitionAas in(6.2.1). Letµdenote the coherence ofU1.Fix a failure probabilityδ∈(0, 1)and accuracy factorε∈(0, 1).IfScomprises
`≥2µε−2klog(k/δ)
columns of the identity matrix, sampled uniformly at random without replacement, andC=ApS andW=STA2p−1S,then the corresponding SPSD sketch satisfies
A−CW†CT 2≤
1+
n (1−ε)`
1/(2p−1)
A−Ak 2,
A−CW†CT 2≤
A−Ak 2+
1
δ2(1−ε)Tr
(A−Ak)2p−1
1/(2p−1)
,
A−CW†CT F≤
A−Ak
F+ γp−1 δ
r 2
1−ε+ γ2p−2 δ2(1−ε)
!
Tr A−Ak , and
Tr
A−CW†CT
≤
1+ γ2p−2 δ2(1−ε)
Tr A−Ak ,
simultaneously, with probability at least1−4δ. If, additionally, k≥rank(A),then
A=CW†CT
with probability exceeding1−δ.
Remark 6.10. Theorem 6.9, like the main result of [TR10], promises exact recovery with probability at least 1−δwhenAis exactly rankkand has small coherence, with a sample of O(klog(k/δ))columns. Unlike the result in[TR10], Theorem6.9is applicable in the case that Ais full-rank but has a sufficiently fastly decaying spectrum.
Remark6.11. Letp=1. The first spectral-norm error bound in Theorem6.9simplifies to
A−CW†CT 2≤
1+ n
(1−ε)`
A−Ak 2.
The multiplicative factor in this relative error bound is optimal in terms of its dependence onnand`. This fact follows from the connection between Nyström extensions and the column subset selection problem.
Indeed,[BDMI11]establishes the following lower bound for the column subset selection problem: for anyα >0 there exist matricesMαsuch that for anyk≥1 and any`≥1, the error of approximatingMαwithPDMα, whereDmay beanysubset of`columns ofMα, satisfies
Mα−PDMα 2≥
rn+α2
`+α2 ·
Mα−(Mα)k
2,
where (Mα)k is the rank-k matrix that best approximatesMαin the spectral norm. We get a lower bound on the error of the Nyström extension by takingAα=MαTMα: it follows from the remark following Theorem6.2 that for anyk≥1 and`≥1, any Nyström extension formed
usingC=AαSconsisting of`columns sampled fromAαsatisfies
Aα−CW†CT
2≥ n+α2
`+α2 ·λk+1(Aα).
Proof of Theorem6.9. The sketching matrixSis formed by taking the first`columns of a uni- formly sampled random permutation matrix. Recall thatΩ1=UT1SandΩ2=U2TS. By Lemma6.8, Ω1 has full row-rank with probability at least 1−δ, so the bounds in Theorems6.2,6.6, and6.7 are applicable. In particular, ifk>rank(A)thenA=CW†CT.
First we use Theorems6.2and6.7to develop the first spectral-norm error bound and the trace-norm error bound. To apply these theorems, we need estimates of the quantities
Σ2p−1/2Ω2Ω†1
2
2 and
Σ12/2Ω2Ω†1 F.
Applying Lemma6.8, we see thatkΩ†1k22≤n/((1−ε)`)with probability exceeding 1−δ. Observe that
Ω2
2≤
U2
2kSk2≤1, so
Σ2p−1/2Ω2Ω†1
2 2≤
Σp−1/22
2 2
Ω†1
2
2≤ n
(1−ε)`
Σ2
2p−1
2 . (6.5.1)
Likewise,
Σ12/2Ω2Ω†1 F≤
Σ12/2Ω2
F
Ω†
2≤
Ç n (1−ε)`
Σ12/2Ω2
F. (6.5.2)
These estimates hold simultaneously with probability at least 1−δ.
To further develop (6.5.2), observe that sinceSselects`columns uniformly at random,
E Σ12/2Ω2
2 F=E
Σ12/2U2TS
2 F=X`
i=1Ekxik2,
where the summandsxi are distributed uniformly at random over the columns ofΣ12/2U2T. The summands all have the same expectation:
Ekxik2= 1 n
Xn
j=1k(Σ2UT2)(j)k2= 1 n
Σ12/2U2
2 F= 1
n Σ12/2
2
F=Tr Σ2
.
Consequently,
E Σ12/2Ω2
2 F= `
nTr Σ2
,
so by Jensen’s inequality
E Σ12/2Ω2
F≤
E
Σ12/2Ω2
2 F
1/2
= r`
nTr Σ2
.
Now applying Markov’s inequality to (6.5.2), we see that
Σ12/2Ω2Ω†1 F≤ 1
δE
Σ12/2Ω2Ω†1 F≤ 1
δ r 1
(1−ε)Tr Σ2
with probability at least 1−2δ. Clearly one can similarly show that
Σp−1/22 Ω2Ω†1 F≤ 1
δ r 1
(1−ε)Tr
Σ2p−12
(6.5.3)
with probability at least 1−2δ.
Thus far, we have established thatΩ1has full row-rank and the estimates
Σ2p−1/2Ω2Ω†1
2
2≤ n
(1−ε)`
Σ2
2p−1
2 and (6.5.4)
Σ1/22 Ω2Ω†1 F≤ 1
δ r 1
(1−ε)Tr Σ2
(6.5.5)
hold simultaneously with probability at least 1−2δ.
These estimates used in Theorems6.2and6.7yield the trace-norm error bound and the first spectral-norm error bound.
To develop the second spectral-norm error bound, observe that by (6.5.3), we also have the estimate
Σ2p−1/2Ω2Ω†1
2 2≤
Σp−2 1/2Ω2Ω†1
2
F≤ 1
δ2(1−ε)Tr
Σ2p−2 1
which holds with probability at least 1−2δ. This estimate used in Theorem6.2yields the second spectral-norm bound.
To develop the Frobenius-norm bound, observe that Theorem6.6can be weakened to the assertion that, whenΩ1 has full row-rank, the estimate
A−CW†CT F≤
Σ2
F+γp−1
Σ12/2Ω2Ω†1 F
p2 Tr Σ2
+γ2p−2
Σ12/2Ω2Ω†1
2
F (6.5.6)
holds. Recall that with probability at least 1−2δ, the estimate (6.5.5) holds andΩ1 has full row-rank. Insert the estimate (6.5.5) into (6.5.6) to establish the claimed Frobenius-norm error bound.