Proof of stability of ICHOL(0) - Incomplete Cholesky factorization

Chapter V: Incomplete Cholesky factorization

5.4 Proof of stability of ICHOL(0)

Theorem17implies that the application of Algorithm3to a suitableO (𝜖)-perturbation Θ−𝐸 returns anO (𝜖)-accurate Cholesky factorization ofΘin computational com-

82 plexity O (𝑁log²(𝑁)log²^𝑑(𝑁/𝜖)). In practice, we do not have access to𝐸, so we need to rely on the stability of Algorithm3 to deduce that Θ andΘ−𝐸 (used as inputs) would yield similar outputs for sufficiently small 𝐸. Even though such a stability property ofICHOL(0)would also be required by prior works on incomplete LU-factorization such as [94], we did not find this type of result in the literature. We also found it surprisingly difficult to prove (and were unable to do so) when using the maximin ordering and sparsity pattern, although we always observed stability of Algorithm3in practice, for reasonable values of𝜌.

The key problem is that the standard perturbation bounds for Schur complements are multiplicative. Therefore, applying them𝑁times (after each elimination) results in a possible growth of the approximation error that is exponential in𝑁 and cannot be compensated for by a logarithmic increase in𝜌.

However, we have already seen in Section5.2that when using a supernodal multicolor ordering, the incomplete Cholesky factorization can be expressed in a smaller number of groups of independent dense linear algebra operations. In this section, we are going to prove rigorously that the number of colors used by the multicolor ordering is upper bounded as O (log(𝑁)) and that this allows us to control the approximation of the supernodal factorization by only invoking O (log(𝑁)) Schur complement perturbation bounds. Therefore, the error amplification is polynomial in 𝑁 and can be controlled by choosing 𝜌 ' log(𝑁). By relating ordinary and supernodal Cholesky factorization, we are able to deduce the same error bounds for the ordinary Cholesky factorization when using a supernodal multicolor ordering and sparsity pattern.

5.4.2 Revisiting the supernodal multicolor ordering

We begin by reintroducing the supernodal multicolor ordering of Section 5.2 in slightly different notation.

For𝑟 >0, 1 ≤ 𝑘 ≤ 𝑞 and𝑖 ∈𝐽⁽^𝑘⁾, write

𝐵_𝑟^(𝑘) (𝑖) B {𝑗 ∈𝐽⁽^𝑘⁾ |𝑑(𝑖, 𝑗) ≤𝑟}. (5.11) Construction 3(Supernodal multicolor ordering and sparsity pattern). LetΘ∈R^𝐼^×^𝐼 with 𝐼 B Ð

1≤𝑘≤𝑞𝐽⁽^𝑘) and let 𝑑( ·, · ) be a hierarchical pseudometric. For 𝜌 ≥ 1, define thesupernodal multicolor ordering≺𝜌andsparsity pattern𝑆_𝜌as follows. For

each 𝑘 ∈ {1, . . . , 𝑞}, select a subset𝐽˜⁽^𝑘) ⊂ 𝐽^(𝑘) of indices such that

∀𝑖,˜ 𝑗˜∈ 𝐽˜⁽^𝑘⁾, 𝑖˜≠ 𝑗˜ =⇒ 𝐵^(𝑘)

𝜌/2 𝑖˜

∩𝐵^(𝑘)

𝜌/2 𝑗˜

=∅, (5.12)

∀𝑖 ∈𝐽^(𝑘), ∃˜𝑖 ∈ 𝐽˜⁽^𝑘) :𝑖 ∈ 𝐵⁽

𝑘) 𝜌 𝑖˜

. (5.13)

Assign every index in 𝐽⁽^𝑘⁾ to the element of 𝐽˜⁽^𝑘⁾ closest to it, using an arbitrary method to break ties. That is, writing 𝑗 { 𝑗˜for the assignment of 𝑗 to 𝑗˜,

𝑗 ∈arg min

˜ 𝑗⁰∈𝐽˜^(𝑘)

𝑑 𝑗 , 𝑗˜⁰

, (5.14)

for all 𝑗 ∈ 𝐽⁽^𝑘) and 𝑗˜∈ 𝐽˜^(𝑘) such that 𝑗 { 𝑗˜. Define𝐼˜B Ð

1≤𝑘≤𝑞𝐽˜⁽^𝑘) and define the auxiliary sparsity pattern𝑆˜_𝜌 ⊂ 𝐼˜×𝐼˜by

𝑆˜_𝜌 B 𝑖,˜ 𝑗˜

∈ 𝐼˜×𝐽˜

∃𝑖 { 𝑖, 𝑗˜ { 𝑗˜: 𝑑(𝑖, 𝑗) ≤ 𝜌 . (5.15) Define the sparsity pattern𝑆_𝜌 ⊂ 𝐼 ×𝐼 as

𝑆_𝜌 B

(𝑖, 𝑗) ∈ 𝐼× 𝐼

∃𝑖,˜ 𝑗˜∈ 𝐼˜:𝑖 {𝑖, 𝑗˜ { 𝑗 ,˜ 𝑖,˜ 𝑗˜

∈𝑆˜_𝜌 (5.16) and call the elements of𝐽˜⁽^𝑘) supernodes. Color each 𝑗˜∈ 𝐽˜^(𝑘) in one of 𝑝^(𝑘) colors such that no 𝑖,˜ 𝑗˜ ∈ 𝐽˜^(𝑘) with 𝑖,˜ 𝑗˜

∈ 𝑆˜_𝜌 have the same color. For 𝑖 ∈ 𝐽⁽^𝑘) write node(𝑖)for the𝑖˜∈ 𝐽˜^(𝑘) such that𝑖 {𝑖˜and write color(𝑖˜)for the color of𝑖˜. Define the supernodal multicolor ordering≺𝜌by reordering the elements of 𝐼such that

(1) 𝑖 ≺𝜌 𝑗 for𝑖 ∈𝐽^(𝑘), 𝑗 ∈ 𝐽^(𝑙) and𝑘 < 𝑙;

(2) within each level 𝐽^(𝑘), we order the elements of supernodes colored in the same color consecutively, i.e. given 𝑖, 𝑗 ∈ 𝐽⁽^𝑘) such that color(node(𝑖)) ≠ color(node(𝑗)),𝑖 ≺𝜌 𝑗 =⇒ 𝑖⁰ ≺𝜌 𝑗⁰for color(node(𝑖⁰)) = color(node(𝑖)), and color(node(𝑗⁰)) =color(node(𝑗)); and

(3) the elements of each supernode appear consecutively, i.e. given 𝑖, 𝑗 ∈ 𝐽⁽^𝑘⁾ such that node(𝑖) ≠ node(𝑗), 𝑖 ≺𝜌 𝑗 =⇒ 𝑖⁰ ≺𝜌 𝑗⁰ for node(𝑖⁰) = node(𝑖), and node(𝑗⁰) =node(𝑗).

Starting from a hierarchical ordering and sparsity pattern, the modified ordering and sparsity pattern can be obtained efficiently:

Lemma 14. In the setting of Examples 1 and 2, given {(𝑖, 𝑗) | 𝑑(𝑖, 𝑗) ≤ 𝜌}, there exist constants𝐶and𝑝_maxdepending only on the dimension𝑑and the cost of computing𝑑( ·,· )such that the ordering and sparsity pattern presented in Construc- tion3can be constructed with 𝑝⁽^𝑘) ≤ 𝑝_max, for each1 ≤ 𝑘 ≤ 𝑞, in computational complexity𝐶 𝑞 𝜌^𝑑𝑁.

84 Proof. The aggregation into supernodes can be done via a greedy algorithm by keeping track of all nodes that are not already within distance 𝜌/2 of a supernode and removing them one at a time. We can then go through 𝜌-neighbourhoods and remove points within distance𝜌/2 from our list of candidates for future supernodes.

To create the coloring, we use the greedy graph coloring of [125] on the undirected graph 𝐺 with vertices ˜𝐽⁽^𝑘⁾ and edges

(𝑖,˜ 𝑗˜) ∈ 𝑆˜_𝜌

𝑖,˜ 𝑗˜ ∈ 𝐽˜⁽^𝑘⁾ . Defining deg(𝐺) as the maximum number of edges connected to any vertex of𝐺, the computational complexity of greedy graph coloring is bounded above by deg(𝐺)#

𝐽⁽^𝑘)

and the number of colors used by deg(𝐺) +1. A sphere-packing argument shows that deg(𝐺)is at most a constant depending only on the dimension𝑑, which yields the

result.

5.4.3 Proof of stability of incomplete Cholesky factorization in the supernodal multicolor ordering

We will now bound the approximation error of the Cholesky factors obtained from Algorithm3, using the supernodal multicolor ordering and sparsity pattern described in Construction3. For ˜𝑖, 𝑗˜∈ 𝐼˜, letΘ_𝑖,_˜_𝑗_˜ be the submatrix (Θ𝑖 𝑗)_𝑖_∈˜_{𝑖, 𝑗}_∈_𝑗_˜and let√

𝑀 be the (dense and lower-triangular) Cholesky factor of a matrix𝑀.

Algorithm 3 with supernodal multicolor ordering ≺𝜌 and sparsity pattern 𝑆_𝜌 is equivalent to the block-incomplete Cholesky factorization described in Algorithm9 where the functionRestrict!(Θ, 𝑆_𝜌) sets all entries ofΘoutside of𝑆_𝜌to zero.

Algorithm 9Supernodal incomplete Cholesky factorization Input: Θ∈R^𝐼^×^𝐼 symmetric

Output: 𝐿 ∈R^𝐼^×^𝐼 lower triangular Restrict!(Θ, 𝑆_𝜌)

for𝑖˜∈ 𝐼˜do 𝐿:,˜𝑖 ← Θ_:_,˜_𝑖/p

Θ_𝑖,˜_˜_𝑖^>

for 𝑗˜˜𝑖˜: (𝑖,˜ 𝑗˜) ∈𝑆˜do

for𝑘˜ 𝑗˜: (𝑘 ,˜ 𝑖˜),(𝑘 ,˜ 𝑗˜) ∈ 𝑆˜do Θ_{𝑘 ,}_˜ _𝑗_˜ ←Θ_{𝑘 ,}_˜ _𝑗_˜−Θ_{𝑘 ,˜}_˜_𝑖(Θ_𝑖,_˜_𝑖_˜)⁻¹Θ_{𝑗 ,˜}_˜_𝑖 end for

end for end for return 𝐿

We will now reformulate the above algorithm using the fact that the elimination of nodes of the same color, on the same level of the hierarchy, happens consecutively.

Let 𝑝 be the maximal number of colors used on any level of the hierarchy. We can then write 𝐼 =Ð

1≤𝑘≤𝑞,1≤𝑙≤𝑝𝐽⁽^{𝑘 ,𝑙}⁾, where 𝐽⁽^{𝑘 ,𝑙}⁾ is the set of indices on level 𝑘 colored in the color 𝑙. LetΘ(𝑘 ,𝑙),(𝑚,𝑛) be the restriction of Θto 𝐽⁽^{𝑘 ,𝑙}⁾ × 𝐽⁽^𝑚,𝑛⁾ and write (𝑚, 𝑛) ≺ (𝑘 , 𝑙) ⇐⇒ 𝑚 < 𝑘 or(𝑚 = 𝑘 and𝑛 < 𝑙). We can then rewrite Algorithm9as:

Algorithm 10Supernodal incomplete Cholesky factorization Input: Θ∈R^𝐼^×^𝐼 symmetric

Output: 𝐿 ∈R^𝐼×𝐼 lower triangular for1≤ 𝑘 ≤ 𝑞do

for1≤ 𝑙 ≤ 𝑝do Restrict!(Θ, 𝑆_𝜌) 𝐿_(:_,_{:),(𝑘 ,𝑙)} ←Θ_(:,:),(𝑘 ,𝑙)/p

Θ_{(𝑘 ,𝑙),(}_{𝑘 ,𝑙)}^>

Θ←Θ−Θ_(:,:),(𝑘 ,𝑙) Θ₍𝑘 ,𝑙),(𝑘 ,𝑙)

−1

Θ₍𝑘 ,𝑙),(:,:)

end for end for return 𝐿

For 1 ≤ 𝑘 ≤ 𝑞,1 ≤ 𝑙 ≤ 𝑝and a matrix𝑀 ∈R^𝐼^×^𝐼 with𝑀₍_:_,_:_{),(𝑚,𝑛)}, 𝑀_{(𝑚,𝑛),(}_:_,_:₎ =0 for all(𝑚, 𝑛) ≺ (𝑘 , 𝑙), letS[𝑀]be the matrix obtained by applyingRestrict!(𝑀 , 𝑆_𝜌) followed by the Schur complementation𝑀 ← 𝑀−𝑀_(:_,_{:),(𝑘 ,𝑙)} 𝑀(𝑘 ,𝑙),(𝑘 ,𝑙)−1

𝑀₍_{𝑘 ,𝑙),}_(:_,_:). We now prove a stability estimate for the operatorS. Let𝑀_{𝑘 ,(𝑚,𝑛)} be the restriction of a matrix𝑀 ∈R^𝐼×𝐼 to𝐽^(𝑘) ×𝐽^(𝑚,𝑛).

Lemma 15. For1 ≤ 𝑘^◦≤ 𝑞and1 ≤ 𝑙^◦ ≤ 𝑝letΘ, 𝐸 ∈R^𝐼^×^𝐼 be such that

Θ_(:,:),(𝑚,𝑛),Θ₍𝑚,𝑛),(:,:) =0for all(𝑚, 𝑛) ≺ (𝑘^◦, 𝑙^◦), (5.17) and (writingΘ𝑘 ,𝑙 for the 𝐽⁽^𝑘⁾ ×𝐽⁽^𝑙⁾ submatrix ofΘand𝜆_maxfor maximal singular values) define

𝜆_min B𝜆_min(Θ𝑘^◦, 𝑘^◦), 𝜆_maxB max

𝑘^◦≤𝑘≤𝑞

𝜆_max(Θ𝑘^◦, 𝑘). (5.18) If

max

𝑘^◦≤𝑘 ,𝑙≤𝑞k𝐸_{𝑘 ,𝑙}k_Fro ≤ 𝜖 ≤ 𝜆_min

2 , (5.19)

86 then the following perturbation estimate holds:

max

𝑘^◦≤𝑘 ,𝑙≤𝑞

S[Θ] −S[Θ+𝐸]

𝑘 ,𝑙

Fro

≤ 3

2 +2𝜆_max 𝜆_min

+8 𝜆²

max

𝜆²

min

𝜖 . (5.20) Proof. Write ˜Θ, ˜𝐸for the versions ofΘ,𝐸set to zero outside of𝑆_𝜌. For𝑘^◦ ≤ 𝑘 , 𝑙 ≤ 𝑞,

(S[Θ+𝐸] −S[Θ])𝑘 ,𝑙 (5.21)

=Θ˜𝑘 ,𝑙+𝐸˜_{𝑘 ,𝑙}− Θ˜ +𝐸˜

𝑘 ,(𝑘◦,𝑙◦) Θ˜ +𝐸˜−1

(𝑘◦,𝑙◦),(𝑘◦,𝑙◦) Θ˜ +𝐸˜

(𝑘◦,𝑙◦),𝑙 (5.22)

−Θ˜𝑘 ,𝑙+Θ˜𝑘 ,(𝑘^◦,𝑙^◦)Θ˜⁻₍_𝑘¹◦,𝑙^◦),(𝑘^◦,𝑙^◦)Θ˜(𝑘^◦,𝑙^◦),𝑙 (5.23)

=𝐸˜_{𝑘 ,𝑙}+ Θ˜ +𝐸˜

𝑘 ,(𝑘◦,𝑙◦) Θ˜ +𝐸˜−1

(𝑘◦,𝑙◦),(𝑘◦,𝑙◦)𝐸˜₍_𝑘◦,𝑙◦),(𝑘◦,𝑙◦)Θ˜⁻¹₍_𝑘◦,𝑙◦),(𝑘◦,𝑙◦) Θ˜ +𝐸˜

(𝑘◦,𝑙◦),𝑙

(5.24)

− Θ˜ +𝐸˜

𝑘 ,(𝑘^◦,𝑙^◦)Θ˜⁻¹₍_𝑘◦,𝑙◦),(𝑘◦,𝑙◦) Θ˜ +𝐸˜

(𝑘^◦,𝑙^◦),𝑙+Θ˜𝑘 ,(𝑘^◦,𝑙^◦)Θ˜⁻¹₍_𝑘◦,𝑙◦),(𝑘◦,𝑙◦)Θ˜₍𝑘^◦,𝑙^◦),𝑙

(5.25)

=𝐸˜_{𝑘 ,𝑙}+ Θ˜ +𝐸˜

𝑘 ,(𝑘^◦,𝑙^◦) Θ˜ +𝐸˜−1

(𝑘^◦,𝑙^◦),(𝑘^◦,𝑙^◦)𝐸˜₍_𝑘◦,𝑙^◦),(𝑘^◦,𝑙^◦)Θ˜⁻₍_𝑘¹◦,𝑙^◦),(𝑘^◦,𝑙^◦) Θ˜ +𝐸˜

(𝑘^◦,𝑙^◦),𝑙

(5.26)

−𝐸˜_{𝑘 ,}₍_𝑘◦,𝑙◦)Θ˜⁻¹₍_𝑘◦,𝑙^◦),(𝑘^◦,𝑙^◦)Θ˜(𝑘◦,𝑙◦),𝑙−Θ˜𝑘 ,(𝑘◦,𝑙◦)Θ˜⁻¹₍_𝑘◦,𝑙^◦),(𝑘^◦,𝑙^◦)𝐸˜₍_𝑘◦,𝑙◦),𝑙 (5.27)

−𝐸˜_{𝑘 ,}_(𝑘◦,𝑙^◦)Θ˜⁻¹₍_𝑘◦,𝑙◦),(𝑘◦,𝑙◦)𝐸˜_(𝑘◦,𝑙^◦),𝑙, (5.28)

where the second equality follows from the matrix identity

(𝐴+𝐵)⁻¹ = 𝐴⁻¹− (𝐴+𝐵)⁻¹𝐵 𝐴⁻¹. (5.29) Now recall that, for all 𝐴 ∈ R^𝑛×𝑚, 𝐵 ∈ R^𝑚×𝑠, k𝑀k ≤ k𝑀k_Fro and k𝐴 𝐵k_Fro ≤ k𝐴k k𝐵k_Fro. Therefore, k (𝐴+𝐸)⁻¹k ≤ 2/𝜆_min and k𝐴+𝐸k ≤ 2𝜆_max. Combining these estimates and using the triangle inequality yields

(S[𝐴+𝐸] −S[𝐴])𝑘 ,𝑙

Fro (5.30)

≤ k𝐸_{𝑘 ,𝑙}k_Fro+8𝜆²

max

𝜆²

min

k𝐸_𝑘◦, 𝑘^◦k_Fro+𝜆_max 𝜆_min

( k𝐸_{𝑘 ,𝑙}k_Fro+ k𝐸_{𝑙 , 𝑘}k_Fro) (5.31) +𝜆⁻¹

mink𝐸_{𝑘 , 𝑘}◦k_Frok𝐸_𝑘◦,𝑙k_Fro (5.32)

≤ 1+8𝜆²

max

𝜆²

min

+2𝜆_max 𝜆_min

+ 𝜖 𝜆_min

𝜖 (5.33)

≤ 3

2+2𝜆_max 𝜆_min

+8𝜆²

max

𝜆²

min

𝜖 . (5.34)

Recursive application of the above lemma gives a stability result for the incomplete Cholesky factorization.

Lemma 16. For 𝜌 > 0, let ≺𝜌 and 𝑆_𝜌 be a supernodal ordering and sparsity pattern such that the maximal number of colors used on each level is at most 𝑝. Let𝐿^𝑆^𝜌be an invertible lower-triangular matrix with nonzero pattern𝑆_𝜌and define 𝑀 B 𝐿^𝑆^𝜌𝐿^𝑆^𝜌^,>. Assume that 𝑀 satisfies Condition2 with constant𝜅. Then there exists a universal constant𝐶 such that, for all0 < 𝜖 < ^𝜆^min⁽^𝑀)

2𝑞²(𝐶 𝜅)²^{𝑞 𝑝} and all𝐸 ∈R^𝐼×𝐼 withk𝐸k_Fro ≤ 𝜖,

𝑀 −𝐿˜^𝑆^𝜌𝐿˜^𝑆^𝜌^,>

Fro ≤ 𝑞²(𝐶 𝜅)²^{𝑞 𝑝}𝜖 , (5.35) where 𝐿˜^(𝑆^𝜌⁾ is the Cholesky factor obtained by applying Algorithm10to𝑀+𝐸. Proof. The result follows from applying Lemma15at each step of Algorithm10.

5.4.4 Conclusion

Using the stability result in Lemma 16, we can finally prove that when using the supernodal multicolor ordering and sparsity pattern incomplete Cholesky factorization applied toΘ attains an𝜖-accurate Cholesky factorization in computational complexityO

𝑁log²(𝑁)log²^𝑑(𝑁/𝜖) .

Theorem 18. In the setting of Examples1and2, there exists a constant𝐶depending only on 𝑑 , 𝑠,kL k,kL⁻¹k, ℎ, and 𝛿 such that, given the ordering ≺𝜌 and sparsity pattern 𝑆_𝜌 defined as in Construction 3 with 𝜌 ≥ 𝐶log(𝑁/𝜖), the incomplete Cholesky factor 𝐿obtained from Algorithm3has accuracy

k𝐿 𝐿^𝑇 −Θk_Fro ≤ 𝜖 . (5.36)

Furthermore, Algorithm3has complexity of at most𝐶 𝑁 𝜌²^𝑑log²(𝑁)in time and at most𝐶 𝑁 𝜌^𝑑log(𝑁)in space.

Proof of Theorem18. Theorem9implies that by choosing 𝜌 ≥ 𝐶˜log(𝑁/𝜖), there exists a lower-triangular matrix ˜𝐿^𝑆^𝜌with

Θ−𝐿˜^𝑆^𝜌𝐿˜^𝑆^𝜌^,^>

Fro ≤ 𝜖 and sparsity pattern 𝑆_𝜌. Theorem 7 implies that the Examples 1 and 2 satisfy 𝜆_min ≥ 1/poly(𝑁).

Therefore, choosing 𝜌 ≥ 𝐶˜log𝑁 ensures that 𝜖 <

𝜆_min(Θ)

2 and thus that ˜Θ B 𝐿˜^𝑆^𝜌𝐿˜^𝑆^𝜌^,^> satisfies Condition2 with constant 2𝐶_Φ, where𝐶_Φ is the corresponding constant forΘ. By possibly changing ˜𝐶 again,𝜌 ≥𝐶˜log𝑁 ensures that

𝜖 ≤ 𝜆_min(Θ) 2𝑞² 𝐶 𝜅 Θ˜ ²^{𝑞 𝑝}

88 where 𝐶 is the constant of Lemma 16, since 𝑞 ≈ log𝑁 and, by Lemma 14, 𝑝 is bounded independently of𝑁. Thus, by Lemma16, the Cholesky factor𝐿^𝑆^𝜌obtained from applying Algorithm10toΘ =Θ˜ + Θ−Θ˜

satisfies

Θ˜ −𝐿^𝑆^𝜌𝐿^𝑆^𝜌^,^>

Fro ≤ 𝑞²(4𝐶 𝜅)²^{𝑞 𝑝}𝜖 ≤ poly(𝑁)𝜖 , (5.37) where 𝜅 is the constant with which Θ satisfies Condition 2 and the polynomial depends only on𝐶, 𝜅, and 𝑝. Since, for the ordering≺𝜌 and sparsity pattern 𝑆_𝜌, the Cholesky factors obtained via Algorithms 3 and 10 coincide, we obtain the

result.

This result holds for both element-wise and supernodal factorization, in either its left, up, or right-looking forms. As remarked in Section5.3.1, using the two-way supernodal sparsity pattern for factorization of Θ⁻¹ in the fine-to-coarse ordering degrades the asymptotic complexity. Therefore, the above result does not immedi- ately prove the accuracy of the Cholesky factorization in this setting, with optimal complexity. However, the column-supernodal factorization described in Section5.3 can similarly be described in terms of O (log(𝑁)) Schur-complementations. Thus, the above proof can be modified to show that when using the column-supernodal multicolor ordering and sparsity pattern, ICHOL(0)applied to Θ⁻¹ computes an 𝜖-approximation in computational complexityO (𝑁log(𝑁/𝜖)).

5.5 Numerical example: Compression of dense kernel matrices

Dalam dokumen Inference, Computation, and Games (Halaman 104-111)