SRHTs applied to general matrices - Matrix computations with SRHT matrices

5.3 Matrix computations with SRHT matrices

5.3.2 SRHTs applied to general matrices

Y_ii = (1−σ²_i(X))/σi(X). We conclude the proof with the series of estimates

kYk2=max_1≤i≤k Y_ii

=max_1≤i≤k

1−σ²_i(X) σ_i(X)

=max

( 1−σ²₁(X) σ1(X) ,

1−σ²_k(X) σk(X)

)

≤ pε p1−p

ε ≤1.54p ε.

The second to last inequality follows from the lower bound in (5.3.6).

diagonal matrix of Rademacher random variables. Then for every t≥0,

max_j=1,...,nk(ADH^T)_(j)k2≤ 1

pnkAkF+ t pnkAk2

≥1−n·e^−t²^/⁸.

Recall that(ADH^T)_(j)denotes the jth column ofADH^T.

Proof. Our proof of Lemma5.8is essentially that of Lemma5.6in[Tro11b], with attention paid to the fact thatAis no longer assumed to have orthonormal columns.

Lemma5.8follows immediately from the observation that the norm of any one column of ADH^T is a convex Lipschitz function of a Rademacher vector. Consider the norm of the jth column ofADH^T as a function of", whereD=diag("):

f_j(") =kADH^Te_jk=

Adiag(")h_j 2=

Adiag(h_j)"

whereh_j denotes the jth column ofH^T. Evidently f_j is convex. Furthermore,

|f_j(x)− f_j(y)| ≤

Adiag(h_j) (x−y)

2≤ kAk2

diag(h_j) 2

x−y

2= 1 pnkAk2

x−y

where we used the triangle inequality and the fact that

diag(h_j)

2=kh_jk_∞=n⁻¹^/². Thus f_j is convex and Lipschitz with Lipschitz constant of at mostn⁻^1/2kAk2.

We calculate

f_j(")

≤E

f_j(")²1/2

= Tr

Adiag(h_j)E

""^∗

diag(h_j)A^T1/2

Tr 1

nAA^T 1/2

= 1

pnkAkF.

It now follows from Lemma4.1that, for all j=1, 2, . . . ,n, the norm of the jth column of ADH^T satisfies the tail bound

ADH^Te_j 2≥ 1

pnkAkF+ t pnkAk2

≤e⁻^t²^/⁸.

Taking a union bound over all columns ofADH^T, we conclude that

max_j=_1,...,n

(ADH^T)_(j) 2≥ 1

pnkAkF+ t pnkAk2

≤n·e⁻^t²^/⁸.

Our next lemma shows that the SRHT does not substantially increase the spectral norm of a matrix.

Lemma 5.9(SRHT-based subsampling in the spectral norm). LetA∈R^m^×ⁿhave rankρ,and assume that n is a power of 2. For some r< n, letΘ∈R^`×n be an SRHT matrix. Fix a failure probability0< δ <1.Then,

AΘ^T

2≤5kAk²2+log(ρ/δ)

kAkF+p

8 log(n/δ)kAk2

≥1−2δ.

To establish Lemma5.9, we use the upper Chernoff bound for sums of matrices stated in Lemma4.2.

Proof of Lemma5.9. Write the SVD of A as UΣV^T, where Σ ∈ R^ρ×ρ, and observe that the spectral norm ofAΘ^T is the same as that ofp

n/`·ΣV^TΘ^T.

We control the norm of ΣV^TΘ^T by considering the maximum singular value of its Gram matrix. DefineM=ΣV^TDH^T, so that ΣV^TΘ^T =p

n/`·MR^T, and letG be theρ×ρ Gram

matrix ofMR^T :

G=MR^T(MR^T)^T.

Evidently

λmax(G) = ` n

ΣV^TΘ^T

2. (5.3.8)

Recall that M_(j) denotes the jth column of M. If we denote by C the random set of r coordinates to whichRrestricts, then

G=X

j∈CM_(j)M₍^T_j₎.

ThusGis a sum of r random matricesX₁, . . . ,X_r sampled without replacement from the set X ={M₍_j₎M_(j)^T : j=1, 2, . . . ,n}. There are two sources of randomness inG: the subsampling matrixRand the Rademacher random variables on the diagonal ofD.

Set

B= 1 n

kΣkF+p

8 log(n/δ)kΣk2

and letEbe the event

max_j=1,...,nkM_(j)k²2≤B.

By Lemma5.8, Eoccurs with probability at least 1−δ. WhenEholds, for all j=1, 2, . . . ,n,

λmax

M_(j)M_(j)^T

=kM_(j)k²₂≤B,

soG is a sum of random positive-semidefinite matrices each of whose norms is bounded by B. Note that the event E is entirely determined by the random matrix D; in particular E is independent ofR.

Conditioning on E, the randomness inRallows us to use the upper matrix Chernoff bound of Lemma4.2to control the maximum eigenvalue ofG. We observe that

µmax=`·λmax E X₁

= ` nλmax

X_n

j=1M_(j)M^T_(j)

= ` nkΣk²₂.

Take the parameterν in Lemma4.2to be

ν =4+ B µmax

log(ρ/δ)

to obtain the relation

λmax(G)≥5µmax+Blog(ρ/δ) | E ≤(ρ−k)·e^[δ−(¹^+ν)^log⁽¹^+ν)](µ^max^/B)

≤ρ·e⁽¹⁻⁽⁵^/⁴⁾^{log 5}^)δ(µ^max^/B)

≤ρ·e⁽⁻⁽⁵^/⁴⁾^{log 5}⁻¹⁾^log^(ρ/δ)< δ. (5.3.9)

The second inequality holds becauseν≥4 implies that(1+ν)log(1+ν)≥ν·(5/4)log 5.

We have conditioned on E, the event that the squared norms of the columns ofMare all smaller thanB. Thus, substituting the values ofBandµmaxinto (5.3.9), we find that

λmax(G)≥ ` n

5kΣk²₂+log(ρ/δ)

kΣkF+p

8 log(n/δ)kΣk2

≤2δ.

Use equation (5.3.8) to wrap up.

Similarly, the SRHT is unlikely to substantially increase the Frobenius norm of a matrix.

Lemma 5.10(SRHT-based subsampling in the Frobenius norm). Assume n is a power of 2. Let A∈R^m×n, and letΘ∈R^`×nbe an SRHT matrix for some` <n.Fix a failure probability0< δ <1.

Then, for anyη≥0,

P n

AΘ^T

F≤(1+η)kAk²F

≥1−

e^η (1+η)^1+η

`/ 1+p

8 log(n/δ)²

−δ.

Proof. Letc_j = (n/`)·

(ADH^T)_(j)

2 denote the squared norm of the jth column ofp n/`· ADH^T. Then, since right multiplication byR^T samples columns uniformly at random without replacement,

AΘ^T

2 F= n

ADH^TR^T

2 F=X_`

i=1X_i (5.3.10)

where the random variables X_i are chosen randomly without replacement from the set{c_j}ⁿ_j₌₁. There are two independent sources of randomness in this sum: the choice of summands, which is determined byR, and the magnitudes of the{c_j}, which are determined byD.

To bound this sum, we first condition on the event Ethat eachc_j is bounded by a quantityB which depends only on the random matrixD. Then

§X_`

i=1X_i≥(1+η)X_`

i=1EX_i ª

≤P

§X_`

i=1X_i ≥(1+η)X_`

i=1EX_i | E

ª+P(E^c).

To selectB, we observe that Lemma5.8implies that with probability 1−δ, the entries ofDare such that

max_jc_j ≤ n

`· 1

n(kAkF+p

8 log(n/δ)kAk2)²≤ 1

`(1+p

8 log(n/δ))²kAk²_F.

Accordingly, we take

B=1

`(1+p

8 log(n/δ))²kAk²_F,

thereby arriving at the bound

§X_`

i=1X_i≥(1+η)X_`

i=1EX_i ª

≤P

§X_`

i=1X_i ≥(1+η)X_`

i=1EX_i | E

ª+δ. (5.3.11)

After conditioning onD, we observe that the randomness remaining on the right-hand side of (5.3.11) is due to the choice of the summandsX_i, which is determined byR. We address this randomness by applying a scalar Chernoff bound (Lemma4.2withk=1). To do so, we need µmax, the expected value of the sum; this is an elementary calculation:

EX₁=n⁻¹Xn

j=1c_j= 1

`kAk²F,

soµmax=`EX₁=kAk²_F.

Applying Lemma4.2conditioned onE, we conclude that

P n

AΘ^T

F≥(1+η)kAk²_F | E o

≤

e^η (1+η)¹^+η

`/(1+p

8 log(n/δ))²

+δ

forη≥0.

Finally, we prove a novel result on approximate matrix multiplication involving SRHT matrices.

Lemma 5.11 (SRHT for approximate matrix multiplication). Assume n is a power of 2. Let A∈R^m^×ⁿandB∈Rⁿ^×^p.For some` <n, letΘ∈R^`×ⁿbe an SRHT matrix. Fix a failure probability 0< δ <1.Assume R satisfies0≤R≤p

`/(1+p

8 log(n/δ)).Then,







AΘ^TΘB−AB

F≤2(R+1)kAkFkBkF+p

8 log(n/δ)kAkFkBk2







≥1−e⁻^R²^/4−2δ.

Remark5.12. Recall that the stable rank sr(A) =kAk²_F/kAk²₂reflects the decay of the spectrum of the matrixA. The event under consideration in Lemma5.11can be rewritten as a bound on the relative error of the approximationAΘ^TΘBto the productAB:

AΘ^TΘB−AB F

kABkF ≤ kAkFkBkF

kABkF ·R+2 p` ·



1+

p8 log(n/δ) sr(B)



.

In this form, we see that the relative error is controlled by the deterministic condition number for the matrix multiplication problem as well as the stable rank ofBand the number of column samples`. Since the roles ofAandBin this bound can be interchanged, in fact we have the bound

AΘ^TΘB−AB F

kABkF

≤ kAkFkBkF

kABkF

·R+2 p` ·



1+

p8 log(n/δ) max(sr(B), sr(A))



.

To prove the lemma, we use Lemma4.3, our result for approximate matrix multiplication via uniform sampling (without replacement) of the columns and the rows of the two matrices involved in the product. Lemma5.11is simply a specific application of this generic result. We mention that Lemma 3.2.8 in[Dri02]gives a similar result for approximate matrix multiplication which, however, gives a bound on the expected value of the error term, while our Lemma5.11 gives a comparable bound that holds with high probability.

Proof of Lemma5.11. LetX=ADH^T andY=HDBand formXˆandYˆaccording to Lemma4.3.

Then,XY=ABand

AΘ^TΘB−AB F=

ˆXˆY−XY F.

To apply Lemma4.3, we first condition on the event that the SRHT equalizes the column norms

of our matrices. Namely, we observe that, from Lemma5.8, with probability at least 1−2δ,

max_ikX_(i)k2≤ 1

pn(kAkF+p

8 log(n/δ)kAk2), and (5.3.12) max_ikY⁽ⁱ⁾k2≤ 1

pn(kBkF+p

8 log(n/δ)kBk2).

We choose the parametersσandBin Lemma4.3. Set

σ²=4

`(kBkF+p

8 log(n/δ)kBk2)²kAk²F. (5.3.13)

In view of (5.3.12),

σ²=4n

`·(kYkF+p

8 log(n/δ)kYk2)²

n kXk²_F≥4n

` Xn

i=1kX_(i)k²₂kY⁽ⁱ⁾k²₂

so this choice ofσsatisfies the inequality condition of Lemma4.3. Next we choose

B= 2

`(kAkF+p

8 log(n/δ)kAk2)(kBkF+p

8 log(n/δ)kBk2).

Again, because of (5.3.12),Bsatisfies the requirementB≥²ⁿ_` max_ikX_(i)k2kY⁽ⁱ⁾k2. For simplicity, abbreviateγ=8 log(n/δ). With these choices forσ²andB,

σ²

B = 2kAk²_F(kBkF+pγkBk2)² (kAkF+pγkAk2)(kBkF+pγkBk2)

≥ 2kAk²_F(kBkF+pγkBk2)² (kAkF+pγkAkF)(kBkF+pγkBk2)

= 2kAkF(kBkF+pγkBk2)

1+pγ .

Now, referring to (5.3.13), identify the numerator asp

`σto see that

σ² B ≥

p`σ 1+p

8 log(n/δ).

Apply Lemma4.3to see that, when (5.3.12) holds and 0≤Rσ≤σ²/B,

AΘ^TΘB−AB

F≥(R+1)σ©

≤exp

−R² 4

From our lower bound onσ²/B, we know that the conditionRσ≤σ²/Bis satisfied when

R≤p

`/(1+p

8 log(n/δ)).

We established above that (5.3.12) holds with probability at least 1−2δ. From these two facts, it follows that when 0≤R≤p

`/(1+p

8 log(n/δ)),

AΘ^TΘB−AB

F≥(R+1)σ©

≤exp

−R² 4

+2δ.

The tail bound given in the statement of Lemma5.11follows when we substitute our estimate ofσ.

Dalam dokumen Analyses of Randomized Linear Algebra (Halaman 136-145)