1.4 Contributions
1.4.4 Randomized SPSD sketches
Chapter6considers the problem of forming a low-rank approximation to a symmetric positive- semidefinite matrixA∈Rn×n using “SPSD sketches.” LetS be a matrix of size n×`, where
`n. Then the SPSD sketch ofAcorresponding toSisCW†CT, where
C=AS and W=STAS.
Sketches formed according to this model have rank at most`and are also symmetric positive-
semidefinite. The simplest such SPSD sketches are formed by taking S to contain random columns sampled uniformly without replacement from the appropriate identity matrix. These sketches, known as Nyström extensions, are popular in applications where it is expensive or undesirable to have full access toA: Nyström extensions require only knowledge of`columns ofA.
The accuracy of SPSD sketches can be increased using the so-called power method, wherein one takes the sketching matrix to beS=ApS0 for some integer p≥ 2 andS0 is a sketching matrix. The corresponding SPSD sketch isApS0(S0TA2p−1S0)†ST0Ap.
Chapter6establishes a framework for the analysis of SPSD sketches, and supplies spectral, Frobenius, and trace-norm error bounds for SPSD sketches corresponding to randomSsampled from several distributions. The error bounds obtained are asymptotically smaller than the other bounds available in the literature for SPSD sketching schemes. Our bounds apply to sketches constructed using the power method, and we see that the errors of these sketches decrease like (λk+1(A)/λk(A))p.
In particular, our framework supplies an optimal spectral-norm error bound for Nyström extensions. Because they are based on uniform column sampling, Nyström extensions perform best when the information in the topk-dimensional eigenspace is distributed evenly throughout the columns ofA. One way to quantify this idea uses the concept ofcoherence, taken from the matrix completion literature[CR09]. LetS be ak-dimensional subspace ofRn. The coherence ofS is
µ(S) = n
kmaxi(PS)ii.
The coherence of the dominant k-dimensional eigenspace of A is a measure of how much comparative influence the individual columns ofAhave on this subspace: ifµis small, then all
columns have essentially the same influence; ifµis large, then it is possible that there is a single column inAwhich alone determines one of the topkeigenvectors ofA.
Talwalkar and Rostamizadeh were the first to use coherence in the analysis of Nyström extensions. Let A be exactly rank-k and µ denote the coherence of its top k-dimensional eigenspace. In[TR10], they show that if one samples on the order ofµklog(k/δ)columns to form a Nyström extension, then with probability at least 1−δthe Nyström extension isexactly A. The framework provided in Chapter6allows us to expand this result to apply to matrices with arbitrary rank. Specifically, we show that when`=O(µklogk), then
A−CW†CT 2≤
1+n
`
A−Ak 2.
with constant probability. This bound is shown to be optimal in the worst case.
Low-rank approximations computed using the SPSD sketching model arenotguaranteed to be numerically stable: if Wis ill-conditioned, then instabilities may arise in forming the product CW†CT. A regularization scheme proposed in[WS01] suggests avoiding numerical ill-conditioning issues by using an SPSD sketch constructed from the matrixA+ρI, where ρ >0 is a regularization parameter. In Chapter6, we provide the first error analysis of this regularization scheme, and compare it empirically to another regularization scheme introduced in[CD11].
Finally, in addition to theoretical results, Chapter 6provides a detailed suite of empirical results on the performance of SPSD sketching schemes applied to matrices culled from data analysis and machine learning applications.
Chapter 2
Bounds for all eigenvalues of sums of Hermitian random matrices
2.1 Introduction
The classical tools of nonasymptotic random matrix theory can sometimes give quite sharp estimates of the extreme eigenvalues of a Hermitian random matrix, but they are not readily adapted to the study of the interior eigenvalues. This is because, while the extremal eigenvalues are the maxima and minima of a random process, more delicate and challenging minimax problems must be solved to obtain the interior eigenvalues.
This chapter introduces a simple method, based upon the variational characterization of eigenvalues, that parlays bounds on the extreme eigenvalues of sums of random Hermitian matrices into bounds that apply to all the eigenvalues1. This technique extends the matrix Laplace transform method detailed in[Tro12]. We combine these ideas to extend several of the inequalities in[Tro12]to address the fluctuations of interior eigenvalues. Specifically, we provide eigenvalue analogs of the classical multiplicative Chernoff bounds and Bennett and Bernstein inequalities.
In this technique, the delicacy of the minimax problems which implicitly define the eigenval-
1The content of this chapter is adapted from the technical report[GT09]co-authored with Joel Tropp.
ues of Hermitian matrices is encapsulated in terms that reflect the fluctuations of the summands in the appropriate eigenspaces. In particular, we see that the fluctuations of thekth eigenvalue of the sum above and below thekth eigenvalue of the expected sum are controlled by two different quantities. This satisfies intuition: for instance, given samples from a nondegenerate stationary random process with finite covariance matrix, one expects that the smallest eigenvalue of the sample covariance matrix is more likely to be an underestimate of the smallest eigenvalue of the covariance matrix than it is to be an overestimate.
We provide two illustrative applications of our eigenvalue tail bounds: Theorem 2.14 quantifies the behavior of the singular values of matrices obtained by sampling columns from a short, fat matrix; and Theorem2.15quantifies the convergence of the eigenvalues of Wishart matrices.