Chapter III: Sparse Cholesky factors by screening
3.6 Cholesky factorization, numerical homogenization, and gamblets
36 sparsity pattern will grow asO ๐log(๐)๐๐
, allowing us to compute the factors in timeO
๐log2(๐)๐2๐
, as detailed in Chapter5. As we will discuss in Chapter 4, the approximation error๐ will usually decay exponentially as log(๐) / log(๐) โ๐, allowing us to compute an๐approximation in complexityO
๐log2(๐)log(๐/๐) . We have seen in Section3.3 that when factorizing the precision matrix ๐ด = ฮโ1, sparsity of a given column is equivalent to independence of the Gaussian process conditional on degrees of freedom after ๐ in the elimination ordering. Thus we factorize theprecision matricesof smooth Gaussian processes by using thereverse maximin ordering as elimination ordering.
Definition 6 (Reverse maximin ordering). A reverse maximin ordering of a point set {๐ฅ๐}๐โ๐ผ โ R๐ with starting set C is obtained by reverting a maximin ordering of {๐ฅ๐}๐โ๐ผ โ R๐ with starting set C. The associated โ๐ is then obtained as the (#๐ผ +1โ๐)-th length scale of the maximin ordering.
As before, the associated sparsity pattern only contains interactions between points within distance๐times the associated length scale.
Definition 7(Reverse maximin sparsity pattern). Given a reverse maximin ordering of the point set{๐ฅ๐}๐โ๐ผ โ R๐with length scales{โ๐}1โค๐โค๐ and a sparsity parameter ๐ โR+, its sparsity set๐ โ ๐ผร๐ผis๐ B
(๐๐, ๐๐)s. t. ๐ > ๐and dist ๐ฅ๐
๐, ๐ฅ๐
๐
โค ๐โ๐ . However, as shown in Figure 3.6, there is an important difference. Since each degree of freedom only interacts with others later in the ordering and the largest length scales now appearlastin the ordering, the reverse maximin sparsity pattern has onlyO ๐ ๐๐
entries. As we will see in Chapters5and6, this allows to compute the Cholesky factors of ๐ดin onlyO ๐ ๐2๐
time.
3.6 Cholesky factorization, numerical homogenization, andgamblets
Figure 3.6: Reverse maximin sparsity pattern. Each column of the reverse maximin sparsity pattern includes interactionswithin a factor๐ofthe corresponding length scaleโ๐. The number of nonzeros per column is approximately constant.
38
Figure 3.7: A Haar-type multiresolution basis. We begin the construction by forming averages on different scales. On all but the coarsest scale, we obtain the basis function functions as linear combination of nearby averages on a given scale that are chosen to be orthogonal to all averages on the coarser scale. In the Figure, we show basis functions on the three coarsest scales.
By computing the orthogonal complement of๐(๐โ1)in๐(๐)for 1 โค ๐ โค ๐, we obtain the orthogonal subspaces๐(๐) that form an orthogonal multiresolution splitting of ๐(๐) =R๐ in the sense that for all 1โค ๐ โค ๐we have๐ผ(๐) = ร
1โค๐โค๐
๐ฝ(๐).
Based on the above, the maximin ordering can be thought of as a rudimentary mul- tiresolution basis, a generalization of the so-called โlazy waveletsโ to multivariate, irregularly sampled data. Instead of using subsampling with density โ โโ๐ ๐, we could have obtained๐(๐) by forming averages over regions of size โ โ๐ and ob- tained๐(๐) as orthogonal complement of๐(๐โ1) in๐(๐), resulting in a Haar-type multiresolution basis as illustrated in Figure3.7.
Analogs of the screening effect illustrated in Figure3.3hold true for a wide range of multiresolution systems in the sense that after measuring a smooth Gaussian process against the elements of๐(๐), its conditional correlations decay rapidly on the scale โ๐of the measurements. In particular, the Cholesky factors of Greenโs and stiffness matrices of elliptic PDEs have almost sparse Cholesky factors, when represented in a multiresolution basis and using a coarse to fine (Greenโs matrix) or fine to coarse (stiffness matrix) elimination ordering. The connection to multiresolution analysis will also allow us to use tools from numerical homogenization to provide rigorous proofs of the screening effect.
3.6.2 Cholesky, Nystrom, and numerical homogenization
Let us now represent the Greenโs matrix ฮ and its inverse ๐ด in a two-scale basis with the first block ฮ1,1, ๐ด1,1 representing the coarse scale and the second block ฮ2,2, ๐ด2,2 representing the fine scale, andฮ12,ฮ21, ๐ด12, ๐ด21representing the inter- actions between the two scales.
Figure 3.8: The hidden multiscale structure of the maximin ordering.The index sets๐ผ(๐) B
๐: โ๐ โค โ๐/โ1 of the maximin ordering can be interpreted as the scale spaces๐(๐) of a multiresolution basis. The index sets๐ฝ(๐) B ๐ผ(๐) \๐ผ(๐โ1) can then be viewed as the resulting orthogonal multiresolution decomposition. In this figure, from left to right, we display ๐ผ(1), ๐ผ(2), and ๐ผ(3), plotting๐ฝ(1) in red, ๐ฝ(2) in orange, and๐ฝ(3) in blue.
An important problem in numerical analysis is to compute a low-rank approximation of ฮ that correctly captures the coarse-scale behavior. Given access to ฮ, this problem can be solved by using the โNystromโ approximation
ฮ1,1 ฮ1,2
ฮ2,1 ฮ2,2
!
โ Id
ฮ2,1 ฮ1,1โ1
! ฮ1,1
Id, ฮ1,1
โ1
ฮ12
. (3.8)
A classical method in numerical analysis [23, Chapter 3], the Nystrom approximation has recently been used to compress kernel matrices arising in machine learning [87,222,250]. Following the discussion in Section3.3, this approximation amounts to assuming that the fine-scale behavior is deterministic, conditional on the coarse- scale behavior. As described in [199], many popular sparse Gaussian process models arise from refinements of this assumption.
In physical problems described by elliptic PDEs, we typically do not have access to ฮ, but rather to its inverse ๐ด, the stiffness matrix arising from a discretization of the differential operator. The field of numerical homogenization is concerned with computing low-rank approximations ofฮthat capture its coarse-scale behavior, from access only to๐ด. At the same time, it tries to preserve (some of) the sparsity of the stiffness matrix๐ดthat arises from the locality of the partial differential operator.
The seminal work of [163] solves this problem by constructing an operator-adapted set of basis functions that achieve (up to a constant factor) the optimal discretization error while being (up to an exponentially small error) localized in space.
40 To understand the relationship of numerical homogenization to sparse Cholesky factorization, we remind ourselves of the linear algebraic identity (see Lemma4.15, we give indexing[ ยท ]๐, ๐ precedence over inversion[ ยท ]โ1)
ฮ1,1 ฮ1,2
ฮ2,1 ฮ2,2
!
=
Id ๐ด1,2๐ดโ1
2,2
0 Id
!
๐ด1,1โ๐ด1,2๐ดโ1
2,2๐ด2,1
0
0 ๐ด2,2
! Id 0
๐ดโ1
2,2๐ด2,1 Id
! !โ1
(3.9)
= Id 0 ๐ดโ1
2,2๐ด2,1 Id
!โ1
ยฉ
ยญ
ยซ
๐ด1,1โ๐ด1,2๐ดโ1
2,2๐ด2,1 โ1
0
0 ๐ดโ1
2,2
ยช
ยฎ
ยฌ
Id ๐ด1,2๐ดโ1
2,2
0 Id
!โ1
(3.10)
= Id 0
โ๐ดโ1
2,2๐ด2,1 Id
!
ยฉ
ยญ
ยซ
๐ด1,1โ๐ด1,2๐ดโ1
2,2๐ด2,1 โ1
0
0 ๐ดโ1
2,2
ยช
ยฎ
ยฌ
Id โ๐ด1,2๐ดโ1
2,2
0 Id
!
. (3.11)
We can now recover the Nystrom approximation as ฮ1,1 ฮ1,2
ฮ2,1 ฮ2,2
!
โ Id
ฮ2,1 ฮ1,1โ1
! ฮ1,1
Id, ฮ1,1
โ1
ฮ12
(3.12)
= Id
โ๐ดโ1
2,2๐ด2,1
!
๐ด1,1โ ๐ด1,2๐ดโ1
2,2๐ด2,1 โ1
Id, โ๐ด1,2๐ดโ1
22
(3.13)
=ฮจ ฮจ>๐ดฮจโ1
ฮจ>, forฮจ B
Id
โ๐ดโ1
2,2๐ด2,1
!
. (3.14)
If we interpret the columns of ฮจ as a new set of operator adapted finite element functions, obtain the Nystrom approximation by projecting the problem onto the space of these adaptive finite element functions, inverting the resulting stiffness matrix, and expressing the resulting solution again in terms the original set of finite element functions. The authors of [163] show that for a two-scale splitting similar to the one in Figure3.7and for๐ดbeing the stiffness matrix of a second-order elliptic PDE with ๐ฟโ-coefficients, the following holds:
1. The matricesฮจandฮจ>๐ดฮจare sparse up to exponentially small terms (local- ization).
2. As we decrease the scale of the small-scale, and hence increase the dimension of ฮจ, the resulting Nystrom approximation of ฮimproves with the optimal rate (homogenization).
3.6.3 Gamblets as Block-Cholesky factorization
The work of [163] provides an efficient, near-optimal coarse-graining of ๐ด, but it does not provide us with a fast solver. In fact, computingฮจ requires inverting the matrix ๐ด2,2, which could be almost as difficult as inverting ๐ด.
Motivated by ideas from game and decision theory, as well as Gaussian process re- gression, [188] extend the construction ofฮจin the last section to multiple scales. The resulting family of operator-adapted wavelets, calledgamblets, can be constructed in near-linear time. Once constructed, it can be used to invert ๐ดin near-linear time.
In order to relate gamblets to Cholesky factorization, we assume that we are given an orthogonal multiresolution decomposition
๐(๐)
1โค๐โค๐ and that for 1 โค ๐ , ๐ โค ๐, the matrix blocks ฮ๐ ,๐, ๐ด๐ ,๐ represent the restriction ofฮ, ๐ด onto๐(๐) ร๐(๐). A multiscale extension of Equation3.9can then be obtained (see Lemma1) by writing
ฮ = ฮจ๐ตโ1ฮจ> = ฮจ ฮจ>๐ดฮจ(โ1)
ฮจ> (3.15)
with
๐ตB
ยฉ
ยญ
ยญ
ยญ
ยญ
ยญ
ยญ
ยซ
๐ต(1) 0 . . . 0
0 ๐ต(2) ...
.. . ..
. ...
... .. .
0 0 . . . ๐ต(๐)
ยช
ยฎ
ยฎ
ยฎ
ยฎ
ยฎ
ยฎ
ยฌ ,ฮจB
ยฉ
ยญ
ยญ
ยญ
ยญ
ยญ
ยญ
ยซ
Id . . . . . . 0
๐ต(2),โ1๐ด(2)
2,1
... 0
.. . ..
.
.. .
.. .
.. . ๐ต(๐),โ1๐ด(๐)
๐ ,1 . . . ๐ต(๐),โ1๐ด(๐)
๐ , ๐โ1 Id ยช
ยฎ
ยฎ
ยฎ
ยฎ
ยฎ
ยฎ
ยฌ
โ1
, (3.16)
where ๐ด(๐) B ฮ1:๐ ,1:๐
and๐ต(๐) B ๐ด(๐)
๐ , ๐.
The columns of ฮจ represent the gamblet basis functions, and ๐ต is the (block- diagonal) stiffness matrix obtained when representing๐ดin the gamblet basis. In the same setting as [163], [188] prove that
1. The matricesฮจandฮจ>๐ดฮจare sparse up to exponentially small terms (local- ization).
2. The approximation of ๐ดobtained by using only the leading๐ block-columns ofฮจimproves at the optimal rate when increasing ๐ (homogenization).
3. The matrices๐ต(๐) have uniformly bounded condition numbers.
Using and extending proof techniques of [142], [189, 190] extend these results to higher order operators and more general classes of multiresolution splittings
๐(๐)
1โค๐โค๐. We build upon these results in order to prove the sparsity of the
42 Cholesky factors ofฮ. To this end, the main analytic contribution of our work is an extension of the homogenization estimates of gamblets to simplistic multiresolution splittings such as the one implied by the maximin ordering.
C h a p t e r 4
PROVING EXPONENTIAL DECAY OF CHOLESKY FACTORS
4.1 Overview
In Chapter 3, we have used the intuition provided by the screening effect to ob- tain elimination orderings and sparsity patterns that lead tonear-linearcomplexity solvers based on sparse Cholesky factorization. But the resulting sparse Cholesky factor merely approximates the originalฮ or ๐ด, with the approximation accuracy depending on ๐.
This raises the question at what rate the error decays when increasing ๐? How strong is the screening effect, and when does it hold? Despite numerous attempts [26,227,228], a rigorous understanding has proven largely elusive. Existing results (see [228] for an overview) are asymptotic and do not provide explicit rates.
A central contribution of this thesis is to provide proof of an exponential screening effect for Gaussian processes with covariance functions given as Greenโs function of elliptic partial differential equations. This allows us to prove that up to exponentially small errors, the Cholesky factors of ฮand its inverse ๐ด can be approximated by sparse matrices with only O ๐log(๐)๐๐
and O ๐ ๐๐
nonzero entries for an approximation error ๐ satisfying log(๐) / log(๐) โ ๐. These results, which are summarized in Section4.6, can be specialized to:
Theorem 1. Let ฮฉ โ R๐ be a Lipschitz-bounded domain and let L be a linear elliptic partial differential operator of order 2๐ > ๐. Let {๐ฅ๐}1โค๐โค๐ be roughly uniformly distributed inฮฉand ordered in themaximin[reverse maximin]ordering.
For G = Lโ1 the Dirichlet Greenโs function define, ฮ๐ ๐ B G ๐ฅ๐, ๐ฅ๐
and let ๐ฟ be the Cholesky factor ofฮ[๐ด B ฮโ1]. Let๐ โ {1, . . . , ๐}2be themaximin[reverse maximin]sparsity pattern with starting set๐ฮฉand sparsity parameter ๐. Defining
๐ฟ๐
๐ ๐ B 1(๐, ๐)โ๐๐ฟ๐ ๐, we then have log
๐ฟ๐๐ฟ๐,>โ๐ฟ ๐ฟ>
Fro
/log(๐) โ๐ . (4.1)
A key step for proving this result is the derivation of a novel low-rank approximation result that is interesting in its own right.
44 Theorem 2. In the setting of Theorem1, let ๐ฟ:,1:๐ be the rank ๐ matrix defined by the first๐columns of the Cholesky factor๐ฟofฮin the maximin ordering. Denoting ask ยท k the operator norm and asโ๐ the length scales of the ordering, we have
ฮโ๐ฟ(:,1:๐) ๐ฟ:,1:๐>
/ kฮk๐โ2๐ /๐. (4.2)
Following the discussion in Section3.6.2, this low-rank approximation also results in a numerical homogenization results forL.
Theorem 3. In the setting of Theorem1, let๐ฟ:,1:๐be the rank๐matrix defined by the first๐ columns of the Cholesky factor๐ฟof ๐ดB ฮin the reverse maximin ordering.
We then have ฮโฮจ(๐ฟ1:๐ ,1:๐๐ฟ>
1:๐ ,1:๐)โ1ฮจ>
/ kฮk๐โ2๐ /๐, for ฮจ = Id
โ๐ฟ(๐+1):๐ ,1:๐๐ฟโ1
1:๐ ,1:๐
! . (4.3) Here, it is important to note that the resulting approximation ofฮcan be efficiently applied to a vector using matrix-vector multiplication of ๐ฟ and substitution (see Algorithm2) in๐ฟ1:๐ ,1:๐, without forming another matrix.
Rigorous forms of the above theorems are stated and proved as Theorems 11, 12 and13in Section4.6.
For ๐ โค ๐/2, Theorems 2 and 3 are false, and while Theorem 1 seems to hold empirically, a proof of it remains elusive. However, following the discussion in Section3.6.1, we will prove analogues of Theorems1,2, and3where the maximin ordering is replaced by a simple averaging-based multiresolution scheme.
While we present our results in terms of the exact Greenโs matrix, analog results for the approximate Greenโs matrix ฮdiscrete obtained as the inverse of a Galerkin discretization๐ดdiscreteofLusing locally supported basis functions could be obtained by repeating the proof in this setting.
4.2 Setting and notation
4.2.1 The class of elliptic operators
For our rigorous, a priori, complexity-vs.-accuracy estimates, we assume that G is the Greenโs function of an elliptic operator L of order 2๐ (๐ , ๐ โ N), defined on a bounded Lipschitz domainฮฉ โ R๐, and acting on๐ป๐
0(ฮฉ), the Sobolev space of (zero boundary value) functions having derivatives of order ๐ in ๐ฟ2(ฮฉ). More
Figure 4.1: A regularity criterion. We measure the regularity of the distributions of measurement points as the ration of๐ฟmin, the smallest distancebetween neighboring points or points and the boundary, and๐ฟmax, the radius of the largest ballthat does not contain any points.
precisely, writing ๐ปโ๐ (ฮฉ) for the dual space of ๐ป๐
0(ฮฉ) with respect to the ๐ฟ2(ฮฉ) scalar product, our rigorous estimates will be stated for an arbitrary linear bijection
L: ๐ป๐
0(ฮฉ) โ๐ปโ๐ (ฮฉ) (4.4)
that issymmetric (i.e.โซ
ฮฉ๐ขL๐ฃd๐ฅ = โซ
ฮฉ๐ฃL๐ขd๐ฅ), positive (i.e.โซ
ฮฉ๐ขL๐ขd๐ฅ โฅ 0), and localin the sense that
โซ
ฮฉ
๐ขL๐ฃd๐ฅ =0 for all๐ข, ๐ฃ โ๐ป๐
0(ฮฉ)such that supp๐ขโฉsupp๐ฃ=โ . (4.5) Let kL k B sup๐ขโ๐ป๐
0 kL๐ขk๐ปโ๐ /k๐ขk๐ป๐
0 and kLโ1k B sup๐โ๐ปโ๐ kLโ1๐k๐ป๐
0/k๐k๐ปโ๐
denote the operator norms ofLandLโ1. The complexity and accuracy estimates for our algorithm will depend on (and only on)๐ , ๐ ,ฮฉ,kL k, kLโ1k, and the parameter
๐ฟ B ๐ฟmin ๐ฟmax B
min๐โ ๐โ๐ผdist ๐ฅ๐,{๐ฅ๐} โช๐ฮฉ
max๐ฅโฮฉdist(๐ฅ ,{๐ฅ๐}๐โ๐ผ โช๐ฮฉ), (4.6) the geometric meaning of which is illustrated in Figure4.1.
46 4.2.2 Discretization in the abstract
Before talking about computation, we need to discretize the infinite-dimensional spaces ๐ป๐
0(ฮฉ) and ๐ปโ๐ (ฮฉ) by approximating them with finite vector spaces. We first introduce this procedure in the abstract.
ForBa separable Banach space with dual space Bโ (such as ๐ป๐
0(ฮฉ) and๐ปโ๐ (ฮฉ)), we write [ ยท, ยท ] for the duality product between Bโ and B. Let L: B โ Bโ be a linear bijection and let G B Lโ1. Assume L to be symmetric and positive (i.e.
[L๐ข, ๐ฃ] = [L๐ฃ , ๐ข]and[L๐ข, ๐ข] โฅ 0 for๐ข, ๐ฃ โ B). Letk ยท kbe the quadratic (energy) norm defined byk๐ขk2B [L๐ข, ๐ข]for๐ข โ Band letk ยท kโbe its dual norm defined by
k๐kโ B sup
0โ ๐ขโB
[๐, ๐ข]
k๐ขk =[๐,G๐]for๐ โ Bโ. (4.7) Let {๐๐}๐โ๐ผ be linearly independent elements ofBโ (known asmeasurement func- tions) and letฮโR๐ผร๐ผ be the symmetric positive-definite matrix defined by
ฮ๐ ๐ B [๐๐,G๐๐] for๐, ๐ โ ๐ผ. (4.8) We assume that we are given ๐ โ N and a partition ๐ผ = ร
1โค๐โค๐๐ฝ(๐) of ๐ผ. We represent๐ผ ร๐ผ matrices as๐ร๐ block matrices according to this partition. Given an ๐ผ ร ๐ผ matrix ๐, we write ๐๐ ,๐ for the (๐ , ๐)th block of ๐, and ๐๐
1:๐2,๐1:๐2 for the sub-matrix of ๐ defined by blocks ranging from ๐1 to ๐2 and๐1 to๐2. Unless specified otherwise, we write ๐ฟ for the lower-triangular Cholesky factor of ฮand define
ฮ(๐) B ฮ1:๐ ,1:๐, ๐ด(๐) B ฮ(๐),โ1, ๐ต(๐) B ๐ด(๐)
๐ , ๐ for 1 โค ๐ โค ๐. (4.9)
We interpret the{๐ฝ(๐)}1โค๐โค๐as labelling a hierarchy of scales with๐ฝ(1)representing the coarsest and๐ฝ(๐) the finest. We write ๐ผ(๐) forร
1โค๐0โค๐๐ฝ(๐
0).
Throughout this section, we assume that the ordering of the set ๐ผ of indices is compatible with the partition ๐ผ = ร
๐=1๐๐ฝ(๐), i.e. ๐ < ๐, ๐ โ ๐ฝ(๐) and ๐ โ ๐ฝ(๐) together imply๐ โบ ๐. We will write ๐ฟ or chol(ฮ) for the Cholesky factor ofฮin that ordering.
4.2.3 Discretization of๐ป๐
0(ฮฉ)and๐ปโ๐ (ฮฉ)
While similar results are true for a wide range of measurements{๐๐} โ B =๐ปโ๐ (ฮฉ) we will restrict our attention to two archetypical examples given by pointwise evaluation and nested averages.
We will assume (without loss of generality after rescaling) that diam(ฮฉ) โค 1. As described in Figure3.8, successive points of the maximin ordering can be gathered into levels so that after appropriate rescaling of the measurements, the Cholesky factorization in the maximin ordering falls in the setting of Example1.
Example 1. Let ๐ > ๐/2. For โ, ๐ฟ โ (0,1) let {๐ฅ๐}๐โ๐ผ(1) โ {๐ฅ๐}๐โ๐ผ(2) โ ยท ยท ยท โ {๐ฅ๐}๐โ๐ผ(๐) be a nested hierarchy of points inฮฉthat are homogeneously distributed at each scale in the sense of the following three inequalities:
(1) sup๐ฅโฮฉmin๐โ๐ผ(๐) |๐ฅโ๐ฅ๐| โค โ๐, (2) min๐โ๐ผ(๐)inf๐ฅโ๐ฮฉ|๐ฅโ๐ฅ๐| โฅ ๐ฟ โ๐, and (3) min๐, ๐โ๐ผ(๐):๐โ ๐ |๐ฅ๐โ๐ฅ๐| โฅ ๐ฟ โ๐.
Let๐ฝ(1) B ๐ผ(1) and๐ฝ(๐) B ๐ผ(๐) \ ๐ผ(๐โ1) for ๐ โ {2, . . . , ๐}. Let๐น denote the unit Dirac delta function and choose
๐๐ B โ
๐ ๐
2 ๐น(๐ฅโ๐ฅ๐) for๐ โ ๐ฝ(๐) and๐ โ {1, . . . , ๐}. (4.10) The discretization chosen in Example 1 is not applicable for ๐ < ๐/2 since in this case, functions in ๐ป๐
0(ฮฉ) are not defined point-wise and thus ๐น โ Bโ. A possible alternative is to replace ๐๐ B โ
๐ ๐
2 ๐น(๐ฅ โ๐ฅ๐) with ๐๐ B โโ
๐ ๐
2 1๐ตโ(0)(๐ฅ โ ๐ฅ๐). However, while the exponential decay result in Theorem 4 seems to be true empirically for this choice of measurements, we are unable to prove it for๐ < ๐/2.
Furthermore, the numerical homogenization result of Theorem 7 is false for this choice of measurements and ๐ < ๐/2. However, our results can still be recovered by choosing measurements obtained as a hierarchy of local averages.
Given subsets ห๐ผ ,๐ฝห โ ๐ผ, we extend a matrix ๐ โ R๐ผหร๐ฝห to an element of R๐ผร๐ฝ by padding it with zeros.
Example 2. (See Figure4.2.) Forโ, ๐ฟโ (0,1), let(๐(๐)
๐ )๐โ๐ผ(๐) be uniformly Lipschitz convex sets forming a regular nested partition of ฮฉ in the following sense. For ๐ โ {1, . . . , ๐}, ฮฉ = ร
๐โ๐ผ(๐)๐(
๐)
๐ is a disjoint union except for the boundaries.
๐ผ(๐) is a nested set of indices, i.e. ๐ผ(๐) โ ๐ผ(๐+1) for ๐ โ {1, . . . , ๐ โ 1}. For ๐ โ {2, . . . , ๐} and๐ โ ๐ผ(๐โ1), there exists a subset ๐๐ โ ๐ผ(๐) such that๐ โ ๐๐ and ๐(
๐โ1)
๐ =ร
๐โ๐๐
๐(
๐)
๐ . Assume that each๐(
๐)
๐ contains a ball ๐ต
๐ฟ โ๐(๐ฅ(
๐)
๐ )of center๐ฅ(
๐) ๐
and radius๐ฟ โ๐, and is contained in the ball ๐ต
โ๐(๐ฅ(๐)
๐ ). For๐ โ {2, . . . , ๐}and๐ โ
48
Figure 4.2: Hierarchical averaging. We illustrate the construction described in Example2in the case๐ =2. On the left we see the nested partition of the domain, and on the right we see (the signs of) a possible choice for๐1,๐5, and๐6.
๐ผ(๐โ1), let the submatrices๐ด(๐),๐ โR(๐๐\{๐})ร๐๐ satisfyร
๐โ๐๐๐ด(๐),๐
๐, ๐ ๐ด(๐),๐
๐, ๐ |๐(๐)
๐ | = ๐ฟ๐ ๐ andร
๐โ๐๐๐ด(๐),๐
๐ , ๐ |๐(
๐)
๐ | = 0for each๐ โ ๐๐ \ {๐}, where |๐(
๐)
๐ |denotes the volume of ๐(
๐)
๐ . Let ๐ฝ(1) B ๐ผ(1) and๐ฝ(๐) B ๐ผ(๐) \๐ผ(๐โ1) for ๐ โ {2, . . . , ๐}. Let๐(1) be the ๐ฝ(1)ร ๐ผ(1) matrix defined by๐(1)
๐ ๐ B ๐ฟ๐ ๐. Let๐(๐) be the๐ฝ(๐) ร๐ผ(๐) matrix defined by๐(๐) Bร
๐โ๐ผ(๐โ1) ๐ด(๐),๐ for ๐ >2, where we set ๐๐ B โโ๐ ๐/2
ร
๐โ๐ผ(๐)
๐(๐)
๐, ๐ 1๐(๐)
๐
for each๐ โ ๐ฝ(๐) (4.11) and define [๐๐, ๐ข] B โซ
ฮฉ๐๐๐ขd๐ฅ. In order to keep track of the distance between the different๐๐of Example2, we choose an arbitrary set of points{๐ฅ๐}๐โ๐ผ โ ฮฉwith the property that๐ฅ๐ โsupp(๐๐) for each๐ โ ๐ผ.
In the above, we have discretized the Greenโs functions of the elliptic operators resulting in the Greenโs matrixฮas the fundamental discrete object. The inverse๐ดof ฮcan be interpreted as the stiffness matrix obtained from the Galerkin discretization ofL using the basis given by
๐๐ B ร
๐
๐ด๐ ๐G ๐๐
โ B. (4.12)
These types of basis functions are referred to as gamblets in the prior works of [188โ190] that form the basis for the proofs in this chapter. While exponentially
decaying, these basis functions are nonlocal and unknown apriori, hence they cannot be used to discretize a partial differential operator with unknown Greenโs function.
Forฮthe inverse of a Galerkin discretization of L in a local basis, analog results can be obtained by repeating the proofs of Theorems4and7in the discrete setting.
In the setting of Examples 1 and 2, denoting as ๐ฟ the lower triangular Cholesky factor ofฮorฮโ1, we will show that
|๐ฟ๐ ๐| โค poly(๐)exp(โ๐พ ๐(๐, ๐)), (4.13) for a constant๐พ > 0 and a suitable distance measure๐( ยท, ยท ): ๐ผร ๐ผ โR.
4.3 Algebraic identities and roadmap
We will use the following block-Cholesky decomposition ofฮto obtain (4.13).
Lemma 1. We haveฮ = ๐ฟ ๐ทยฏ ๐ฟยฏ๐, with๐ฟยฏ and๐ท defined by
๐ท B
ยฉ
ยญ
ยญ
ยญ
ยญ
ยญ
ยญ
ยซ
๐ต(1),โ1 0 . . . 0
0 ๐ต(2),โ1 ..
. .. . ..
.
... ...
.. .
0 0 . . . ๐ต(๐),โ1
ยช
ยฎ
ยฎ
ยฎ
ยฎ
ยฎ
ยฎ
ยฌ ,๐ฟยฏ B
ยฉ
ยญ
ยญ
ยญ
ยญ
ยญ
ยญ
ยซ
Id . . . . . . 0
๐ต(2),โ1๐ด(2)
2,1
... 0
.. . ..
.
.. .
.. .
.. . ๐ต(๐),โ1๐ด(๐)
๐ ,1 . . . ๐ต(๐),โ1๐ด(๐)
๐ , ๐โ1 Id ยช
ยฎ
ยฎ
ยฎ
ยฎ
ยฎ
ยฎ
ยฌ
โ1
. (4.14)
In particular, if ๐ฟห is the lower-triangular Cholesky factor of ๐ท, then the lower- triangular Cholesky factor๐ฟofฮis given by ๐ฟ =๐ฟยฏ๐ฟห.
Proof. To obtain Lemma1, we successively apply Lemma2toฮ(see Section.1for details). Lemma2summarizes classical identities satisfied by Schur complements.
Lemma 2([258, Chapter 1.1]). Letฮ =
ฮ
1,1 ฮ1,2
ฮ2,1 ฮ2,2
be symmetric positive definite and ๐ด=
๐ด1,1 ๐ด1,2
๐ด2,1 ๐ด2,2
its inverse. Then ฮ = Id 0
๐ฟ2,1 Id
!
๐ท1,1 0 0 ๐ท2,2
! Id ๐ฟ>
2,1
0 Id
!
(4.15)
๐ด= Id โ๐ฟ>
2,1
0 Id
! ๐ทโ1
1,1 0 0 ๐ทโ1
2,2
! Id 0
โ๐ฟ2,1 Id
!
(4.16) where
๐ฟ2,1= ฮ2,1ฮโ1,11=โ๐ดโ1
2,2๐ด2,1 (4.17)
๐ท1,1= ฮ1,1=
๐ด1,1โ ๐ด1,2๐ดโ1
2,2๐ด2,1 โ1
(4.18) ๐ท2,2= ฮ2,2โฮ2,1ฮโ1,11ฮ1,2= ๐ดโ1
2,2. (4.19)
50 Based on Lemma1, (4.13) can be established by ensuring that:
(1) the matrices ๐ด(๐) (and hence also ๐ต(๐)) decay exponentially according to ๐( ยท, ยท );
(2) the matrices ๐ต(๐) have uniformly bounded condition numbers;
(3) the products of exponentially decaying matrices decay exponentially;
(4) the inverses of well-conditioned exponentially decaying matrices decay expo- nentially;
(5) the Cholesky factors of the inverses of well-conditioned exponentially decay- ing matrices decay exponentially; and
(6) if a ๐ร ๐ block lower-triangular matrix ยฏ๐ฟ with unit block-diagonal decays exponentially, then so does its inverse.
We will carry out this program in the setting of Examples 1and 2and prove that (4.13) holds with
๐(๐, ๐) B โโmin(๐ ,๐)dist(๐ฅ๐, ๐ฅ๐), for each๐ โ๐ฝ(๐), ๐ โ ๐ฝ(๐). (4.20) To prove (1), the matricesฮ(๐), ๐ด(๐) (interpreted as coarse-grained versions of G andL), and ๐ต(๐) will be identified as stiffness matrices of theL-adapted wavelets described in Section3.6.3. This identification is established on the general identities ฮ๐, ๐(๐) = [๐๐,G๐๐] for๐, ๐ โ ๐ผ(๐), ๐ด(๐) = (ฮ(๐))โ1, ๐ด(
๐)
๐, ๐ = [L๐(
๐) ๐ , ๐(
๐)
๐ ] and ๐ต(
๐) ๐, ๐ = [L๐(๐)
๐ , ๐(๐)
๐ ] where๐(๐)
๐ and ๐(๐)
๐ are the gamblets introduced in [190].
4.4 Exponential decay of ๐ด(๐)
Our proof of the exponential decay of ๐ฟwill be based on that of ๐ด(๐) as expressed in the following condition:
Condition 1. Let๐พ , ๐ถ๐พ โR+be constants such that for1โค ๐ โค ๐and๐, ๐ โ๐ผ(๐),
๐ด(
๐) ๐ ๐
โค ๐ถ๐พ
q ๐ด(
๐) ๐๐ ๐ด(
๐)
๐ ๐ exp(โ๐พ ๐(๐, ๐)). (4.21) The matrices๐ด(๐)are coarse-grained versions of the local operatorLand thus inherit some of its locality in the form of exponential decay. Such exponential localization results were first obtained by [163] for the coarse-grained operators obtained from