Low-rank approximation with

Both of the algorithms discussed in this chapter are based on the intuition that when. From the rest of the arguments used in this chapter, so the changes would propagate, mutatis mutandis, through the remaining results in this chapter. We conclude the chapter with an experimental evaluation of the SRHT low-rank approximation algorithms in Section 5.5.

Their choice of pjk, in particular the insertion of the factor (8 logn)4/n, is an artifact of their method of proof. Thus, Theorem 3.14 and the analysis specific to the scheme of Achlioptas and McSherry give results of the same order in and p. Finally, we use Theorem 3.15 to estimate the error of the scheme from [AHK06], which simultaneously quantizes and thins.

Arora, Hazan and Kale establish that this scheme guarantees skA−Xk=O(δ) with probability at least 1−exp(−Ω(n)), so we see that our general bound recovers a bound of the same order. Then the authors show that, with probability at least 1−n−1, the error of the approximation satisfies A−Xk2≤ε.

Preliminaries for the investigation of low-rank approximation algorithms

Probabilistic tools

Concentration of convex functions of Rademacher variables
Chernoff bounds for sums of random matrices sampled without replace- ment
Frobenius-norm error bounds for matrix multiplication

However, sometimes one desires Chernoff bounds that do not require the assignments to be independent. The following Chernoff bounds are useful in the case where the eigenvalues are drawn without replacement from a set of bounded random matrices. The result obtained here differs in that it applies to the sample without replacement model, and it provides bounds on the error that apply with high probability, rather than simply an estimate of the expected error.

In the proof of Theorem 12 in [Gro11], Gross notes that any random variable Z whose mgf is less than the right-hand side of (4.1.2) satisfies a tail inequality of the form. The conclusion of the lemma follows when we use Lemma 4.4 to limit the quantity on the right-hand side. The first inequality is Jensen's, and the following equality holds because the components of the sequence {V0i−V00i} are symmetric and independent.

This evaluation phase is completed by conditioning and using the orthogonality of the Rademacher variables. The bounded tail given in the statement of the lemma follows from applying Lemma 4.4 to our estimates for B, σ2, and µ.

Linear Algebra notation and results

Column-based low-rank approximation

Matrix Pythagoras and generalized least-squares regression
Low-rank approximations restricted to subspaces

Structural results for low-rank approximation

A geometric interpretation of the sampling interaction matrix

The vector ei is the fourth element of the standard Euclidean basis (the dimensionality of which will become clear from the context). Structural results allow us to relate the errors of low approximations formed using projection schemes to the optimal errors kA − Akkξ for ξ = 2, F. The following result, which appears as Lemma 7 in [BMD09], gives an upper bound for the residual error of the low matrix approximation obtained via projections on subspaces.

Let Ω1=V1TSandΩ2=V2TSindicate the interaction of the sampling matrix with the upper and lower right-singular spaces of A. It is clear from Lemmas 4.8 and 4.9 that the quality of the low-rank approximations depends on the norm of the sampling interaction matrix. To give the sampling interaction matrix a geometric interpretation, we first recall the definition of .

When Shas orthonormal columns and V1TShar full row rank, kΩ2Ω†1k2 is the tangent of the largest angle between the region of Sand and the upper right singular space spanned out by V1. We note that tan(S,V1) also arises in the classical limits on the convergence of the orthogonal iteration algorithm for approximating the topk-dimensional singular spaces of a matrix (see e.g. [GV96, Theorem 8.2.2]) .

Low-rank approximation with

Introduction

In this case, it is often more efficient to use iterative design methods (eg, Krylov subspace methods) to obtain approximations to Ac. It is difficult to give an exact guarantee on the number of arithmetic operations performed by Krylov methods, but an iteration of the Krylov method requires Ω(mnk) operations (assuming There is no special structure that can be used to speed up the computation of the matrix -vector products). Thus, an optimistic estimate for the number of operations needed to compute approximate ordered-crushed SVDs using a Krylov method is Ω(mnklogn).

Our discussion so far has only concerned the arithmetic cost of computing truncated SVDs, but the issue of communication costs is equally or more important: bandwidth costs (proportional to the number of storage accesses) and latency costs (proportional to the cost of transferring information across a network or through a level of hierarchy memory system) [BDHS11]. If we want to parallelize the algorithm, the complexity of the required information exchange must also be taken into account. The random algorithms discussed in this chapter, Algorithms 5.1 and 5.2, are interesting because they give low-rank approximations after the arithmetic operations Ω(mnkmax{logn, logk}) and have low communication costs.

In particular, each element of A is accessed only twice, and the algorithms are simple enough to be directly parallelized. The guarantees provided are probabilistic and allow a trade-off between the number of algorithm operations and the accuracy and error probability of the algorithms.

Randomized approximate truncated SVD

Matrix computations with SRHT matrices

Detailed comparison with prior work
SRHTs applied to orthonormal matrices
SRHTs applied to general matrices

Proof of the quality of approximation guarantees
Experiments

The test matrices
Empirical comparison of the SRHT and Gaussian algorithms
Empirical evaluation of our error bounds

We now present a detailed comparison of the guarantees given in Theorem 5.4 with those available in the existing literature. An analysis of the Frobenius norm error of a low-rank matrix approximation algorithm based on SRHT appeared in Nguyen et al.[NDT09]. We have conditioned on E, the event that the squared norms of Mare's columns are all less than B.

After conditioning onD, we note that the remaining randomness on the right-hand side of (5.3.11) is due to the choice of sums Xi, which is determined by R. We recall that the stable range sr(A) =kAk2F/kAk22 reflects the decomposition of spectrum of the matrix A. The first term on the right-hand side of this inequality is simply the forward error of the approximation ΠFAS,k(A).

Our second result bounds the Frobenius norm errors of SRHT-based low-level approximation algorithms. We complete the estimate by limiting the second term on the right-hand side of the above. Summing the failure probabilities of the three estimates used in (5.4.7), we conclude that the bound given in (5.4.8) holds with probability at least 1−δC2log(k/δ)/4−7δ. i) and (iii) hold with this probability.

The coherence of their right singular spaces summarizes the relevant difference in the singular spaces of B and C. The rotation lowers the coherence of the right singular spaces and therefore increases the chance of obtaining an accurate low-rank approximation. Relative spectral and Frobenius norm residual errors of the SRHT and Gaussian low-rank approximation algorithms (kM−PMSMkξ/kM−Mkkξ enkM−ΠFMS,k(M)kξ/kM−Mkkξforξ=2, F) as a function of the target rankkfor the three matricesM=A,B,C.

Because the matricesBandCha have the same singular values, but the singular spaces ofCare are less coherent, the difference in the residual errors of the BandC approximations is evidence that the spectral norm accuracy of the SRHT approaches increases on less coherent datasets; the same applies to a lesser extent to the accuracy of the Frobenius standard. Only on the highly coherent matrix Bdo do we see a notable reduction in residual errors when Gaussian sampling is used instead of an SRHT; But even in this case, the remaining errors of the SRHT approaches are. Figure 5.2 shows the relative forward errors of the Gaussian and SRHT algorithms (kMk−PMSMkξ/kM−MkkξandkMk−ΠFMS,k(M)kξ/kM−Mkkξforξ=2, F) for the non-rank-constrained and rank-constrained approaches .

We observe that the forward errors of both algorithms for both choices of sampling matrices are on the scale of the norm of Mk. The relative spectral and Frobenius-norm forward errors of the SRHT and Gaussian low-rank approximation algorithms (kMk−PMSMkξ/kM−MkkξandkMk−ΠFMS,k(M)kξ/kM−Mkkξforξ=2, F) as a function of the target rank for the three matrices M=A,B,C. For each value of k, the empirical spectral-norm residual error is plotted as the average of the errors over 30 trials of low-rank approximations.

Note from Figure 5.4 that with this choice of ' the spectral norm residual errors of the rank-constrained and non-rank-constrained SRHT approaches are essentially the same.