• Tidak ada hasil yang ditemukan

Cholesky factorization, numerical homogenization, and gamblets

Dalam dokumen Inference, Computation, and Games (Halaman 59-66)

Chapter III: Sparse Cholesky factors by screening

3.6 Cholesky factorization, numerical homogenization, and gamblets

36 sparsity pattern will grow asO ๐‘log(๐‘)๐œŒ๐‘‘

, allowing us to compute the factors in timeO

๐‘log2(๐‘)๐œŒ2๐‘‘

, as detailed in Chapter5. As we will discuss in Chapter 4, the approximation error๐œ– will usually decay exponentially as log(๐œ–) / log(๐‘) โˆ’๐œŒ, allowing us to compute an๐œ–approximation in complexityO

๐‘log2(๐‘)log(๐‘/๐œ–) . We have seen in Section3.3 that when factorizing the precision matrix ๐ด = ฮ˜โˆ’1, sparsity of a given column is equivalent to independence of the Gaussian process conditional on degrees of freedom after ๐‘˜ in the elimination ordering. Thus we factorize theprecision matricesof smooth Gaussian processes by using thereverse maximin ordering as elimination ordering.

Definition 6 (Reverse maximin ordering). A reverse maximin ordering of a point set {๐‘ฅ๐‘–}๐‘–โˆˆ๐ผ โŠ‚ R๐‘‘ with starting set C is obtained by reverting a maximin ordering of {๐‘ฅ๐‘–}๐‘–โˆˆ๐ผ โŠ‚ R๐‘‘ with starting set C. The associated โ„“๐‘˜ is then obtained as the (#๐ผ +1โˆ’๐‘˜)-th length scale of the maximin ordering.

As before, the associated sparsity pattern only contains interactions between points within distance๐œŒtimes the associated length scale.

Definition 7(Reverse maximin sparsity pattern). Given a reverse maximin ordering of the point set{๐‘ฅ๐‘–}๐‘–โˆˆ๐ผ โŠ‚ R๐‘‘with length scales{โ„“๐‘˜}1โ‰ค๐‘˜โ‰ค๐‘ and a sparsity parameter ๐œŒ โˆˆR+, its sparsity set๐‘† โŠ‚ ๐ผร—๐ผis๐‘† B

(๐‘–๐‘˜, ๐‘–๐‘™)s. t. ๐‘˜ > ๐‘™and dist ๐‘ฅ๐‘–

๐‘˜, ๐‘ฅ๐‘–

๐‘™

โ‰ค ๐œŒโ„“๐‘˜ . However, as shown in Figure 3.6, there is an important difference. Since each degree of freedom only interacts with others later in the ordering and the largest length scales now appearlastin the ordering, the reverse maximin sparsity pattern has onlyO ๐‘ ๐œŒ๐‘‘

entries. As we will see in Chapters5and6, this allows to compute the Cholesky factors of ๐ดin onlyO ๐‘ ๐œŒ2๐‘‘

time.

3.6 Cholesky factorization, numerical homogenization, andgamblets

Figure 3.6: Reverse maximin sparsity pattern. Each column of the reverse maximin sparsity pattern includes interactionswithin a factor๐œŒofthe corresponding length scaleโ„“๐‘˜. The number of nonzeros per column is approximately constant.

38

Figure 3.7: A Haar-type multiresolution basis. We begin the construction by forming averages on different scales. On all but the coarsest scale, we obtain the basis function functions as linear combination of nearby averages on a given scale that are chosen to be orthogonal to all averages on the coarser scale. In the Figure, we show basis functions on the three coarsest scales.

By computing the orthogonal complement of๐‘‰(๐‘˜โˆ’1)in๐‘‰(๐‘˜)for 1 โ‰ค ๐‘˜ โ‰ค ๐‘ž, we obtain the orthogonal subspaces๐‘Š(๐‘˜) that form an orthogonal multiresolution splitting of ๐‘‰(๐‘ž) =R๐‘ in the sense that for all 1โ‰ค ๐‘˜ โ‰ค ๐‘žwe have๐ผ(๐‘˜) = ร‰

1โ‰ค๐‘™โ‰ค๐‘˜

๐ฝ(๐‘˜).

Based on the above, the maximin ordering can be thought of as a rudimentary mul- tiresolution basis, a generalization of the so-called โ€œlazy waveletsโ€ to multivariate, irregularly sampled data. Instead of using subsampling with density โ‰ˆ โ„Žโˆ’๐‘‘ ๐‘˜, we could have obtained๐‘‰(๐‘˜) by forming averages over regions of size โ‰ˆ โ„Ž๐‘˜ and ob- tained๐‘Š(๐‘˜) as orthogonal complement of๐‘‰(๐‘˜โˆ’1) in๐‘‰(๐‘˜), resulting in a Haar-type multiresolution basis as illustrated in Figure3.7.

Analogs of the screening effect illustrated in Figure3.3hold true for a wide range of multiresolution systems in the sense that after measuring a smooth Gaussian process against the elements of๐‘‰(๐‘˜), its conditional correlations decay rapidly on the scale โ„Ž๐‘˜of the measurements. In particular, the Cholesky factors of Greenโ€™s and stiffness matrices of elliptic PDEs have almost sparse Cholesky factors, when represented in a multiresolution basis and using a coarse to fine (Greenโ€™s matrix) or fine to coarse (stiffness matrix) elimination ordering. The connection to multiresolution analysis will also allow us to use tools from numerical homogenization to provide rigorous proofs of the screening effect.

3.6.2 Cholesky, Nystrom, and numerical homogenization

Let us now represent the Greenโ€™s matrix ฮ˜ and its inverse ๐ด in a two-scale basis with the first block ฮ˜1,1, ๐ด1,1 representing the coarse scale and the second block ฮ˜2,2, ๐ด2,2 representing the fine scale, andฮ˜12,ฮ˜21, ๐ด12, ๐ด21representing the inter- actions between the two scales.

Figure 3.8: The hidden multiscale structure of the maximin ordering.The index sets๐ผ(๐‘˜) B

๐‘–: โ„Ž๐‘˜ โ‰ค โ„“๐‘–/โ„“1 of the maximin ordering can be interpreted as the scale spaces๐‘‰(๐‘˜) of a multiresolution basis. The index sets๐ฝ(๐‘˜) B ๐ผ(๐‘˜) \๐ผ(๐‘˜โˆ’1) can then be viewed as the resulting orthogonal multiresolution decomposition. In this figure, from left to right, we display ๐ผ(1), ๐ผ(2), and ๐ผ(3), plotting๐ฝ(1) in red, ๐ฝ(2) in orange, and๐ฝ(3) in blue.

An important problem in numerical analysis is to compute a low-rank approximation of ฮ˜ that correctly captures the coarse-scale behavior. Given access to ฮ˜, this problem can be solved by using the โ€œNystromโ€ approximation

ฮ˜1,1 ฮ˜1,2

ฮ˜2,1 ฮ˜2,2

!

โ‰ˆ Id

ฮ˜2,1 ฮ˜1,1โˆ’1

! ฮ˜1,1

Id, ฮ˜1,1

โˆ’1

ฮ˜12

. (3.8)

A classical method in numerical analysis [23, Chapter 3], the Nystrom approximation has recently been used to compress kernel matrices arising in machine learning [87,222,250]. Following the discussion in Section3.3, this approximation amounts to assuming that the fine-scale behavior is deterministic, conditional on the coarse- scale behavior. As described in [199], many popular sparse Gaussian process models arise from refinements of this assumption.

In physical problems described by elliptic PDEs, we typically do not have access to ฮ˜, but rather to its inverse ๐ด, the stiffness matrix arising from a discretization of the differential operator. The field of numerical homogenization is concerned with computing low-rank approximations ofฮ˜that capture its coarse-scale behavior, from access only to๐ด. At the same time, it tries to preserve (some of) the sparsity of the stiffness matrix๐ดthat arises from the locality of the partial differential operator.

The seminal work of [163] solves this problem by constructing an operator-adapted set of basis functions that achieve (up to a constant factor) the optimal discretization error while being (up to an exponentially small error) localized in space.

40 To understand the relationship of numerical homogenization to sparse Cholesky factorization, we remind ourselves of the linear algebraic identity (see Lemma4.15, we give indexing[ ยท ]๐‘–, ๐‘— precedence over inversion[ ยท ]โˆ’1)

ฮ˜1,1 ฮ˜1,2

ฮ˜2,1 ฮ˜2,2

!

=

Id ๐ด1,2๐ดโˆ’1

2,2

0 Id

!

๐ด1,1โˆ’๐ด1,2๐ดโˆ’1

2,2๐ด2,1

0

0 ๐ด2,2

! Id 0

๐ดโˆ’1

2,2๐ด2,1 Id

! !โˆ’1

(3.9)

= Id 0 ๐ดโˆ’1

2,2๐ด2,1 Id

!โˆ’1

ยฉ

ยญ

ยซ

๐ด1,1โˆ’๐ด1,2๐ดโˆ’1

2,2๐ด2,1 โˆ’1

0

0 ๐ดโˆ’1

2,2

ยช

ยฎ

ยฌ

Id ๐ด1,2๐ดโˆ’1

2,2

0 Id

!โˆ’1

(3.10)

= Id 0

โˆ’๐ดโˆ’1

2,2๐ด2,1 Id

!

ยฉ

ยญ

ยซ

๐ด1,1โˆ’๐ด1,2๐ดโˆ’1

2,2๐ด2,1 โˆ’1

0

0 ๐ดโˆ’1

2,2

ยช

ยฎ

ยฌ

Id โˆ’๐ด1,2๐ดโˆ’1

2,2

0 Id

!

. (3.11)

We can now recover the Nystrom approximation as ฮ˜1,1 ฮ˜1,2

ฮ˜2,1 ฮ˜2,2

!

โ‰ˆ Id

ฮ˜2,1 ฮ˜1,1โˆ’1

! ฮ˜1,1

Id, ฮ˜1,1

โˆ’1

ฮ˜12

(3.12)

= Id

โˆ’๐ดโˆ’1

2,2๐ด2,1

!

๐ด1,1โˆ’ ๐ด1,2๐ดโˆ’1

2,2๐ด2,1 โˆ’1

Id, โˆ’๐ด1,2๐ดโˆ’1

22

(3.13)

=ฮจ ฮจ>๐ดฮจโˆ’1

ฮจ>, forฮจ B

Id

โˆ’๐ดโˆ’1

2,2๐ด2,1

!

. (3.14)

If we interpret the columns of ฮจ as a new set of operator adapted finite element functions, obtain the Nystrom approximation by projecting the problem onto the space of these adaptive finite element functions, inverting the resulting stiffness matrix, and expressing the resulting solution again in terms the original set of finite element functions. The authors of [163] show that for a two-scale splitting similar to the one in Figure3.7and for๐ดbeing the stiffness matrix of a second-order elliptic PDE with ๐ฟโˆž-coefficients, the following holds:

1. The matricesฮจandฮจ>๐ดฮจare sparse up to exponentially small terms (local- ization).

2. As we decrease the scale of the small-scale, and hence increase the dimension of ฮจ, the resulting Nystrom approximation of ฮ˜improves with the optimal rate (homogenization).

3.6.3 Gamblets as Block-Cholesky factorization

The work of [163] provides an efficient, near-optimal coarse-graining of ๐ด, but it does not provide us with a fast solver. In fact, computingฮจ requires inverting the matrix ๐ด2,2, which could be almost as difficult as inverting ๐ด.

Motivated by ideas from game and decision theory, as well as Gaussian process re- gression, [188] extend the construction ofฮจin the last section to multiple scales. The resulting family of operator-adapted wavelets, calledgamblets, can be constructed in near-linear time. Once constructed, it can be used to invert ๐ดin near-linear time.

In order to relate gamblets to Cholesky factorization, we assume that we are given an orthogonal multiresolution decomposition

๐‘Š(๐‘˜)

1โ‰ค๐‘˜โ‰ค๐‘ž and that for 1 โ‰ค ๐‘˜ , ๐‘™ โ‰ค ๐‘ž, the matrix blocks ฮ˜๐‘˜ ,๐‘™, ๐ด๐‘˜ ,๐‘™ represent the restriction ofฮ˜, ๐ด onto๐‘Š(๐‘˜) ร—๐‘Š(๐‘™). A multiscale extension of Equation3.9can then be obtained (see Lemma1) by writing

ฮ˜ = ฮจ๐ตโˆ’1ฮจ> = ฮจ ฮจ>๐ดฮจ(โˆ’1)

ฮจ> (3.15)

with

๐ตB

ยฉ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยซ

๐ต(1) 0 . . . 0

0 ๐ต(2) ...

.. . ..

. ...

... .. .

0 0 . . . ๐ต(๐‘ž)

ยช

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฌ ,ฮจB

ยฉ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยซ

Id . . . . . . 0

๐ต(2),โˆ’1๐ด(2)

2,1

... 0

.. . ..

.

.. .

.. .

.. . ๐ต(๐‘ž),โˆ’1๐ด(๐‘ž)

๐‘ž ,1 . . . ๐ต(๐‘ž),โˆ’1๐ด(๐‘ž)

๐‘ž , ๐‘žโˆ’1 Id ยช

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฌ

โˆ’1

, (3.16)

where ๐ด(๐‘˜) B ฮ˜1:๐‘˜ ,1:๐‘˜

and๐ต(๐‘˜) B ๐ด(๐‘˜)

๐‘˜ , ๐‘˜.

The columns of ฮจ represent the gamblet basis functions, and ๐ต is the (block- diagonal) stiffness matrix obtained when representing๐ดin the gamblet basis. In the same setting as [163], [188] prove that

1. The matricesฮจandฮจ>๐ดฮจare sparse up to exponentially small terms (local- ization).

2. The approximation of ๐ดobtained by using only the leading๐‘˜ block-columns ofฮจimproves at the optimal rate when increasing ๐‘˜ (homogenization).

3. The matrices๐ต(๐‘˜) have uniformly bounded condition numbers.

Using and extending proof techniques of [142], [189, 190] extend these results to higher order operators and more general classes of multiresolution splittings

๐‘Š(๐‘˜)

1โ‰ค๐‘˜โ‰ค๐‘ž. We build upon these results in order to prove the sparsity of the

42 Cholesky factors ofฮ˜. To this end, the main analytic contribution of our work is an extension of the homogenization estimates of gamblets to simplistic multiresolution splittings such as the one implied by the maximin ordering.

C h a p t e r 4

PROVING EXPONENTIAL DECAY OF CHOLESKY FACTORS

4.1 Overview

In Chapter 3, we have used the intuition provided by the screening effect to ob- tain elimination orderings and sparsity patterns that lead tonear-linearcomplexity solvers based on sparse Cholesky factorization. But the resulting sparse Cholesky factor merely approximates the originalฮ˜ or ๐ด, with the approximation accuracy depending on ๐œŒ.

This raises the question at what rate the error decays when increasing ๐œŒ? How strong is the screening effect, and when does it hold? Despite numerous attempts [26,227,228], a rigorous understanding has proven largely elusive. Existing results (see [228] for an overview) are asymptotic and do not provide explicit rates.

A central contribution of this thesis is to provide proof of an exponential screening effect for Gaussian processes with covariance functions given as Greenโ€™s function of elliptic partial differential equations. This allows us to prove that up to exponentially small errors, the Cholesky factors of ฮ˜and its inverse ๐ด can be approximated by sparse matrices with only O ๐‘log(๐‘)๐œŒ๐‘‘

and O ๐‘ ๐œŒ๐‘‘

nonzero entries for an approximation error ๐œ– satisfying log(๐œ–) / log(๐‘) โˆ’ ๐œŒ. These results, which are summarized in Section4.6, can be specialized to:

Theorem 1. Let ฮฉ โˆˆ R๐‘‘ be a Lipschitz-bounded domain and let L be a linear elliptic partial differential operator of order 2๐‘  > ๐‘‘. Let {๐‘ฅ๐‘–}1โ‰ค๐‘–โ‰ค๐‘ be roughly uniformly distributed inฮฉand ordered in themaximin[reverse maximin]ordering.

For G = Lโˆ’1 the Dirichlet Greenโ€™s function define, ฮ˜๐‘– ๐‘— B G ๐‘ฅ๐‘–, ๐‘ฅ๐‘—

and let ๐ฟ be the Cholesky factor ofฮ˜[๐ด B ฮ˜โˆ’1]. Let๐‘† โŠ‚ {1, . . . , ๐‘}2be themaximin[reverse maximin]sparsity pattern with starting set๐œ•ฮฉand sparsity parameter ๐œŒ. Defining

๐ฟ๐‘†

๐‘– ๐‘— B 1(๐‘–, ๐‘—)โˆˆ๐‘†๐ฟ๐‘– ๐‘—, we then have log

๐ฟ๐‘†๐ฟ๐‘†,>โˆ’๐ฟ ๐ฟ>

Fro

/log(๐‘) โˆ’๐œŒ . (4.1)

A key step for proving this result is the derivation of a novel low-rank approximation result that is interesting in its own right.

44 Theorem 2. In the setting of Theorem1, let ๐ฟ:,1:๐‘˜ be the rank ๐‘˜ matrix defined by the first๐‘˜columns of the Cholesky factor๐ฟofฮ˜in the maximin ordering. Denoting ask ยท k the operator norm and asโ„“๐‘˜ the length scales of the ordering, we have

ฮ˜โˆ’๐ฟ(:,1:๐‘˜) ๐ฟ:,1:๐‘˜>

/ kฮ˜k๐‘˜โˆ’2๐‘ /๐‘‘. (4.2)

Following the discussion in Section3.6.2, this low-rank approximation also results in a numerical homogenization results forL.

Theorem 3. In the setting of Theorem1, let๐ฟ:,1:๐‘˜be the rank๐‘˜matrix defined by the first๐‘˜ columns of the Cholesky factor๐ฟof ๐ดB ฮ˜in the reverse maximin ordering.

We then have ฮ˜โˆ’ฮจ(๐ฟ1:๐‘˜ ,1:๐‘˜๐ฟ>

1:๐‘˜ ,1:๐‘˜)โˆ’1ฮจ>

/ kฮ˜k๐‘˜โˆ’2๐‘ /๐‘‘, for ฮจ = Id

โˆ’๐ฟ(๐‘˜+1):๐‘ ,1:๐‘˜๐ฟโˆ’1

1:๐‘˜ ,1:๐‘˜

! . (4.3) Here, it is important to note that the resulting approximation ofฮ˜can be efficiently applied to a vector using matrix-vector multiplication of ๐ฟ and substitution (see Algorithm2) in๐ฟ1:๐‘˜ ,1:๐‘˜, without forming another matrix.

Rigorous forms of the above theorems are stated and proved as Theorems 11, 12 and13in Section4.6.

For ๐‘  โ‰ค ๐‘‘/2, Theorems 2 and 3 are false, and while Theorem 1 seems to hold empirically, a proof of it remains elusive. However, following the discussion in Section3.6.1, we will prove analogues of Theorems1,2, and3where the maximin ordering is replaced by a simple averaging-based multiresolution scheme.

While we present our results in terms of the exact Greenโ€™s matrix, analog results for the approximate Greenโ€™s matrix ฮ˜discrete obtained as the inverse of a Galerkin discretization๐ดdiscreteofLusing locally supported basis functions could be obtained by repeating the proof in this setting.

4.2 Setting and notation

4.2.1 The class of elliptic operators

For our rigorous, a priori, complexity-vs.-accuracy estimates, we assume that G is the Greenโ€™s function of an elliptic operator L of order 2๐‘  (๐‘ , ๐‘‘ โˆˆ N), defined on a bounded Lipschitz domainฮฉ โŠ‚ R๐‘‘, and acting on๐ป๐‘ 

0(ฮฉ), the Sobolev space of (zero boundary value) functions having derivatives of order ๐‘  in ๐ฟ2(ฮฉ). More

Figure 4.1: A regularity criterion. We measure the regularity of the distributions of measurement points as the ration of๐›ฟmin, the smallest distancebetween neighboring points or points and the boundary, and๐›ฟmax, the radius of the largest ballthat does not contain any points.

precisely, writing ๐ปโˆ’๐‘ (ฮฉ) for the dual space of ๐ป๐‘ 

0(ฮฉ) with respect to the ๐ฟ2(ฮฉ) scalar product, our rigorous estimates will be stated for an arbitrary linear bijection

L: ๐ป๐‘ 

0(ฮฉ) โ†’๐ปโˆ’๐‘ (ฮฉ) (4.4)

that issymmetric (i.e.โˆซ

ฮฉ๐‘ขL๐‘ฃd๐‘ฅ = โˆซ

ฮฉ๐‘ฃL๐‘ขd๐‘ฅ), positive (i.e.โˆซ

ฮฉ๐‘ขL๐‘ขd๐‘ฅ โ‰ฅ 0), and localin the sense that

โˆซ

ฮฉ

๐‘ขL๐‘ฃd๐‘ฅ =0 for all๐‘ข, ๐‘ฃ โˆˆ๐ป๐‘ 

0(ฮฉ)such that supp๐‘ขโˆฉsupp๐‘ฃ=โˆ…. (4.5) Let kL k B sup๐‘ขโˆˆ๐ป๐‘ 

0 kL๐‘ขk๐ปโˆ’๐‘ /k๐‘ขk๐ป๐‘ 

0 and kLโˆ’1k B sup๐‘“โˆˆ๐ปโˆ’๐‘  kLโˆ’1๐‘“k๐ป๐‘ 

0/k๐‘“k๐ปโˆ’๐‘ 

denote the operator norms ofLandLโˆ’1. The complexity and accuracy estimates for our algorithm will depend on (and only on)๐‘‘ , ๐‘ ,ฮฉ,kL k, kLโˆ’1k, and the parameter

๐›ฟ B ๐›ฟmin ๐›ฟmax B

min๐‘–โ‰ ๐‘—โˆˆ๐ผdist ๐‘ฅ๐‘–,{๐‘ฅ๐‘—} โˆช๐œ•ฮฉ

max๐‘ฅโˆˆฮฉdist(๐‘ฅ ,{๐‘ฅ๐‘–}๐‘–โˆˆ๐ผ โˆช๐œ•ฮฉ), (4.6) the geometric meaning of which is illustrated in Figure4.1.

46 4.2.2 Discretization in the abstract

Before talking about computation, we need to discretize the infinite-dimensional spaces ๐ป๐‘ 

0(ฮฉ) and ๐ปโˆ’๐‘ (ฮฉ) by approximating them with finite vector spaces. We first introduce this procedure in the abstract.

ForBa separable Banach space with dual space Bโˆ— (such as ๐ป๐‘ 

0(ฮฉ) and๐ปโˆ’๐‘ (ฮฉ)), we write [ ยท, ยท ] for the duality product between Bโˆ— and B. Let L: B โ†’ Bโˆ— be a linear bijection and let G B Lโˆ’1. Assume L to be symmetric and positive (i.e.

[L๐‘ข, ๐‘ฃ] = [L๐‘ฃ , ๐‘ข]and[L๐‘ข, ๐‘ข] โ‰ฅ 0 for๐‘ข, ๐‘ฃ โˆˆ B). Letk ยท kbe the quadratic (energy) norm defined byk๐‘ขk2B [L๐‘ข, ๐‘ข]for๐‘ข โˆˆ Band letk ยท kโˆ—be its dual norm defined by

k๐œ™kโˆ— B sup

0โ‰ ๐‘ขโˆˆB

[๐œ™, ๐‘ข]

k๐‘ขk =[๐œ™,G๐œ™]for๐œ™ โˆˆ Bโˆ—. (4.7) Let {๐œ™๐‘–}๐‘–โˆˆ๐ผ be linearly independent elements ofBโˆ— (known asmeasurement func- tions) and letฮ˜โˆˆR๐ผร—๐ผ be the symmetric positive-definite matrix defined by

ฮ˜๐‘– ๐‘— B [๐œ™๐‘–,G๐œ™๐‘—] for๐‘–, ๐‘— โˆˆ ๐ผ. (4.8) We assume that we are given ๐‘ž โˆˆ N and a partition ๐ผ = ร

1โ‰ค๐‘˜โ‰ค๐‘ž๐ฝ(๐‘˜) of ๐ผ. We represent๐ผ ร—๐ผ matrices as๐‘žร—๐‘ž block matrices according to this partition. Given an ๐ผ ร— ๐ผ matrix ๐‘€, we write ๐‘€๐‘˜ ,๐‘™ for the (๐‘˜ , ๐‘™)th block of ๐‘€, and ๐‘€๐‘˜

1:๐‘˜2,๐‘™1:๐‘™2 for the sub-matrix of ๐‘€ defined by blocks ranging from ๐‘˜1 to ๐‘˜2 and๐‘™1 to๐‘™2. Unless specified otherwise, we write ๐ฟ for the lower-triangular Cholesky factor of ฮ˜and define

ฮ˜(๐‘˜) B ฮ˜1:๐‘˜ ,1:๐‘˜, ๐ด(๐‘˜) B ฮ˜(๐‘˜),โˆ’1, ๐ต(๐‘˜) B ๐ด(๐‘˜)

๐‘˜ , ๐‘˜ for 1 โ‰ค ๐‘˜ โ‰ค ๐‘ž. (4.9)

We interpret the{๐ฝ(๐‘˜)}1โ‰ค๐‘˜โ‰ค๐‘žas labelling a hierarchy of scales with๐ฝ(1)representing the coarsest and๐ฝ(๐‘ž) the finest. We write ๐ผ(๐‘˜) forร

1โ‰ค๐‘˜0โ‰ค๐‘˜๐ฝ(๐‘˜

0).

Throughout this section, we assume that the ordering of the set ๐ผ of indices is compatible with the partition ๐ผ = ร

๐‘˜=1๐‘ž๐ฝ(๐‘˜), i.e. ๐‘˜ < ๐‘™, ๐‘– โˆˆ ๐ฝ(๐‘˜) and ๐‘— โˆˆ ๐ฝ(๐‘™) together imply๐‘– โ‰บ ๐‘—. We will write ๐ฟ or chol(ฮ˜) for the Cholesky factor ofฮ˜in that ordering.

4.2.3 Discretization of๐ป๐‘ 

0(ฮฉ)and๐ปโˆ’๐‘ (ฮฉ)

While similar results are true for a wide range of measurements{๐œ™๐‘–} โˆˆ B =๐ปโˆ’๐‘ (ฮฉ) we will restrict our attention to two archetypical examples given by pointwise evaluation and nested averages.

We will assume (without loss of generality after rescaling) that diam(ฮฉ) โ‰ค 1. As described in Figure3.8, successive points of the maximin ordering can be gathered into levels so that after appropriate rescaling of the measurements, the Cholesky factorization in the maximin ordering falls in the setting of Example1.

Example 1. Let ๐‘  > ๐‘‘/2. For โ„Ž, ๐›ฟ โˆˆ (0,1) let {๐‘ฅ๐‘–}๐‘–โˆˆ๐ผ(1) โŠ‚ {๐‘ฅ๐‘–}๐‘–โˆˆ๐ผ(2) โŠ‚ ยท ยท ยท โŠ‚ {๐‘ฅ๐‘–}๐‘–โˆˆ๐ผ(๐‘ž) be a nested hierarchy of points inฮฉthat are homogeneously distributed at each scale in the sense of the following three inequalities:

(1) sup๐‘ฅโˆˆฮฉmin๐‘–โˆˆ๐ผ(๐‘˜) |๐‘ฅโˆ’๐‘ฅ๐‘–| โ‰ค โ„Ž๐‘˜, (2) min๐‘–โˆˆ๐ผ(๐‘˜)inf๐‘ฅโˆˆ๐œ•ฮฉ|๐‘ฅโˆ’๐‘ฅ๐‘–| โ‰ฅ ๐›ฟ โ„Ž๐‘˜, and (3) min๐‘–, ๐‘—โˆˆ๐ผ(๐‘˜):๐‘–โ‰ ๐‘— |๐‘ฅ๐‘–โˆ’๐‘ฅ๐‘—| โ‰ฅ ๐›ฟ โ„Ž๐‘˜.

Let๐ฝ(1) B ๐ผ(1) and๐ฝ(๐‘˜) B ๐ผ(๐‘˜) \ ๐ผ(๐‘˜โˆ’1) for ๐‘˜ โˆˆ {2, . . . , ๐‘ž}. Let๐œน denote the unit Dirac delta function and choose

๐œ™๐‘– B โ„Ž

๐‘˜ ๐‘‘

2 ๐œน(๐‘ฅโˆ’๐‘ฅ๐‘–) for๐‘– โˆˆ ๐ฝ(๐‘˜) and๐‘˜ โˆˆ {1, . . . , ๐‘ž}. (4.10) The discretization chosen in Example 1 is not applicable for ๐‘  < ๐‘‘/2 since in this case, functions in ๐ป๐‘ 

0(ฮฉ) are not defined point-wise and thus ๐œน โˆ‰ Bโˆ—. A possible alternative is to replace ๐œ™๐‘– B โ„Ž

๐‘˜ ๐‘‘

2 ๐œน(๐‘ฅ โˆ’๐‘ฅ๐‘–) with ๐œ™๐‘– B โ„Žโˆ’

๐‘˜ ๐‘‘

2 1๐ตโ„Ž(0)(๐‘ฅ โˆ’ ๐‘ฅ๐‘–). However, while the exponential decay result in Theorem 4 seems to be true empirically for this choice of measurements, we are unable to prove it for๐‘  < ๐‘‘/2.

Furthermore, the numerical homogenization result of Theorem 7 is false for this choice of measurements and ๐‘  < ๐‘‘/2. However, our results can still be recovered by choosing measurements obtained as a hierarchy of local averages.

Given subsets หœ๐ผ ,๐ฝหœ โŠ‚ ๐ผ, we extend a matrix ๐‘€ โˆˆ R๐ผหœร—๐ฝหœ to an element of R๐ผร—๐ฝ by padding it with zeros.

Example 2. (See Figure4.2.) Forโ„Ž, ๐›ฟโˆˆ (0,1), let(๐œ(๐‘˜)

๐‘– )๐‘–โˆˆ๐ผ(๐‘˜) be uniformly Lipschitz convex sets forming a regular nested partition of ฮฉ in the following sense. For ๐‘˜ โˆˆ {1, . . . , ๐‘ž}, ฮฉ = ร

๐‘–โˆˆ๐ผ(๐‘˜)๐œ(

๐‘˜)

๐‘– is a disjoint union except for the boundaries.

๐ผ(๐‘˜) is a nested set of indices, i.e. ๐ผ(๐‘˜) โŠ‚ ๐ผ(๐‘˜+1) for ๐‘˜ โˆˆ {1, . . . , ๐‘ž โˆ’ 1}. For ๐‘˜ โˆˆ {2, . . . , ๐‘ž} and๐‘– โˆˆ ๐ผ(๐‘˜โˆ’1), there exists a subset ๐‘๐‘– โŠ‚ ๐ผ(๐‘˜) such that๐‘– โˆˆ ๐‘๐‘– and ๐œ(

๐‘˜โˆ’1)

๐‘– =ร

๐‘—โˆˆ๐‘๐‘–

๐œ(

๐‘˜)

๐‘— . Assume that each๐œ(

๐‘˜)

๐‘– contains a ball ๐ต

๐›ฟ โ„Ž๐‘˜(๐‘ฅ(

๐‘˜)

๐‘– )of center๐‘ฅ(

๐‘˜) ๐‘–

and radius๐›ฟ โ„Ž๐‘˜, and is contained in the ball ๐ต

โ„Ž๐‘˜(๐‘ฅ(๐‘˜)

๐‘– ). For๐‘˜ โˆˆ {2, . . . , ๐‘ž}and๐‘– โˆˆ

48

Figure 4.2: Hierarchical averaging. We illustrate the construction described in Example2in the case๐‘ž =2. On the left we see the nested partition of the domain, and on the right we see (the signs of) a possible choice for๐œ™1,๐œ™5, and๐œ™6.

๐ผ(๐‘˜โˆ’1), let the submatrices๐”ด(๐‘˜),๐‘– โˆˆR(๐‘๐‘–\{๐‘–})ร—๐‘๐‘– satisfyร

๐‘—โˆˆ๐‘๐‘–๐”ด(๐‘˜),๐‘–

๐‘š, ๐‘— ๐”ด(๐‘˜),๐‘–

๐‘›, ๐‘— |๐œ(๐‘˜)

๐‘— | = ๐›ฟ๐‘š ๐‘› andร

๐‘—โˆˆ๐‘๐‘–๐”ด(๐‘˜),๐‘–

๐‘™ , ๐‘— |๐œ(

๐‘˜)

๐‘— | = 0for each๐‘™ โˆˆ ๐‘๐‘– \ {๐‘–}, where |๐œ(

๐‘˜)

๐‘– |denotes the volume of ๐œ(

๐‘˜)

๐‘– . Let ๐ฝ(1) B ๐ผ(1) and๐ฝ(๐‘˜) B ๐ผ(๐‘˜) \๐ผ(๐‘˜โˆ’1) for ๐‘˜ โˆˆ {2, . . . , ๐‘ž}. Let๐‘Š(1) be the ๐ฝ(1)ร— ๐ผ(1) matrix defined by๐‘Š(1)

๐‘– ๐‘— B ๐›ฟ๐‘– ๐‘—. Let๐‘Š(๐‘˜) be the๐ฝ(๐‘˜) ร—๐ผ(๐‘˜) matrix defined by๐‘Š(๐‘˜) Bร

๐‘–โˆˆ๐ผ(๐‘˜โˆ’1) ๐”ด(๐‘˜),๐‘– for ๐‘˜ >2, where we set ๐œ™๐‘– B โ„Žโˆ’๐‘˜ ๐‘‘/2

ร•

๐‘—โˆˆ๐ผ(๐‘˜)

๐‘Š(๐‘˜)

๐‘–, ๐‘— 1๐œ(๐‘˜)

๐‘—

for each๐‘– โˆˆ ๐ฝ(๐‘˜) (4.11) and define [๐œ™๐‘–, ๐‘ข] B โˆซ

ฮฉ๐œ™๐‘–๐‘ขd๐‘ฅ. In order to keep track of the distance between the different๐œ™๐‘–of Example2, we choose an arbitrary set of points{๐‘ฅ๐‘–}๐‘–โˆˆ๐ผ โŠ‚ ฮฉwith the property that๐‘ฅ๐‘– โˆˆsupp(๐œ™๐‘–) for each๐‘– โˆˆ ๐ผ.

In the above, we have discretized the Greenโ€™s functions of the elliptic operators resulting in the Greenโ€™s matrixฮ˜as the fundamental discrete object. The inverse๐ดof ฮ˜can be interpreted as the stiffness matrix obtained from the Galerkin discretization ofL using the basis given by

๐œ“๐‘– B ร•

๐‘—

๐ด๐‘– ๐‘—G ๐œ™๐‘—

โˆˆ B. (4.12)

These types of basis functions are referred to as gamblets in the prior works of [188โ€“190] that form the basis for the proofs in this chapter. While exponentially

decaying, these basis functions are nonlocal and unknown apriori, hence they cannot be used to discretize a partial differential operator with unknown Greenโ€™s function.

Forฮ˜the inverse of a Galerkin discretization of L in a local basis, analog results can be obtained by repeating the proofs of Theorems4and7in the discrete setting.

In the setting of Examples 1 and 2, denoting as ๐ฟ the lower triangular Cholesky factor ofฮ˜orฮ˜โˆ’1, we will show that

|๐ฟ๐‘– ๐‘—| โ‰ค poly(๐‘)exp(โˆ’๐›พ ๐‘‘(๐‘–, ๐‘—)), (4.13) for a constant๐›พ > 0 and a suitable distance measure๐‘‘( ยท, ยท ): ๐ผร— ๐ผ โ†’R.

4.3 Algebraic identities and roadmap

We will use the following block-Cholesky decomposition ofฮ˜to obtain (4.13).

Lemma 1. We haveฮ˜ = ๐ฟ ๐ทยฏ ๐ฟยฏ๐‘‡, with๐ฟยฏ and๐ท defined by

๐ท B

ยฉ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยซ

๐ต(1),โˆ’1 0 . . . 0

0 ๐ต(2),โˆ’1 ..

. .. . ..

.

... ...

.. .

0 0 . . . ๐ต(๐‘ž),โˆ’1

ยช

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฌ ,๐ฟยฏ B

ยฉ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยซ

Id . . . . . . 0

๐ต(2),โˆ’1๐ด(2)

2,1

... 0

.. . ..

.

.. .

.. .

.. . ๐ต(๐‘ž),โˆ’1๐ด(๐‘ž)

๐‘ž ,1 . . . ๐ต(๐‘ž),โˆ’1๐ด(๐‘ž)

๐‘ž , ๐‘žโˆ’1 Id ยช

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฌ

โˆ’1

. (4.14)

In particular, if ๐ฟหœ is the lower-triangular Cholesky factor of ๐ท, then the lower- triangular Cholesky factor๐ฟofฮ˜is given by ๐ฟ =๐ฟยฏ๐ฟหœ.

Proof. To obtain Lemma1, we successively apply Lemma2toฮ˜(see Section.1for details). Lemma2summarizes classical identities satisfied by Schur complements.

Lemma 2([258, Chapter 1.1]). Letฮ˜ =

ฮ˜

1,1 ฮ˜1,2

ฮ˜2,1 ฮ˜2,2

be symmetric positive definite and ๐ด=

๐ด1,1 ๐ด1,2

๐ด2,1 ๐ด2,2

its inverse. Then ฮ˜ = Id 0

๐ฟ2,1 Id

!

๐ท1,1 0 0 ๐ท2,2

! Id ๐ฟ>

2,1

0 Id

!

(4.15)

๐ด= Id โˆ’๐ฟ>

2,1

0 Id

! ๐ทโˆ’1

1,1 0 0 ๐ทโˆ’1

2,2

! Id 0

โˆ’๐ฟ2,1 Id

!

(4.16) where

๐ฟ2,1= ฮ˜2,1ฮ˜โˆ’1,11=โˆ’๐ดโˆ’1

2,2๐ด2,1 (4.17)

๐ท1,1= ฮ˜1,1=

๐ด1,1โˆ’ ๐ด1,2๐ดโˆ’1

2,2๐ด2,1 โˆ’1

(4.18) ๐ท2,2= ฮ˜2,2โˆ’ฮ˜2,1ฮ˜โˆ’1,11ฮ˜1,2= ๐ดโˆ’1

2,2. (4.19)

50 Based on Lemma1, (4.13) can be established by ensuring that:

(1) the matrices ๐ด(๐‘˜) (and hence also ๐ต(๐‘˜)) decay exponentially according to ๐‘‘( ยท, ยท );

(2) the matrices ๐ต(๐‘˜) have uniformly bounded condition numbers;

(3) the products of exponentially decaying matrices decay exponentially;

(4) the inverses of well-conditioned exponentially decaying matrices decay expo- nentially;

(5) the Cholesky factors of the inverses of well-conditioned exponentially decay- ing matrices decay exponentially; and

(6) if a ๐‘žร— ๐‘ž block lower-triangular matrix ยฏ๐ฟ with unit block-diagonal decays exponentially, then so does its inverse.

We will carry out this program in the setting of Examples 1and 2and prove that (4.13) holds with

๐‘‘(๐‘–, ๐‘—) B โ„Žโˆ’min(๐‘˜ ,๐‘™)dist(๐‘ฅ๐‘–, ๐‘ฅ๐‘—), for each๐‘– โˆˆ๐ฝ(๐‘˜), ๐‘— โˆˆ ๐ฝ(๐‘™). (4.20) To prove (1), the matricesฮ˜(๐‘˜), ๐ด(๐‘˜) (interpreted as coarse-grained versions of G andL), and ๐ต(๐‘˜) will be identified as stiffness matrices of theL-adapted wavelets described in Section3.6.3. This identification is established on the general identities ฮ˜๐‘–, ๐‘—(๐‘˜) = [๐œ™๐‘–,G๐œ™๐‘—] for๐‘–, ๐‘— โˆˆ ๐ผ(๐‘˜), ๐ด(๐‘˜) = (ฮ˜(๐‘˜))โˆ’1, ๐ด(

๐‘˜)

๐‘–, ๐‘— = [L๐œ“(

๐‘˜) ๐‘– , ๐œ“(

๐‘˜)

๐‘— ] and ๐ต(

๐‘˜) ๐‘–, ๐‘— = [L๐œ’(๐‘˜)

๐‘– , ๐œ’(๐‘˜)

๐‘— ] where๐œ“(๐‘˜)

๐‘– and ๐œ’(๐‘˜)

๐‘– are the gamblets introduced in [190].

4.4 Exponential decay of ๐ด(๐‘˜)

Our proof of the exponential decay of ๐ฟwill be based on that of ๐ด(๐‘˜) as expressed in the following condition:

Condition 1. Let๐›พ , ๐ถ๐›พ โˆˆR+be constants such that for1โ‰ค ๐‘˜ โ‰ค ๐‘žand๐‘–, ๐‘— โˆˆ๐ผ(๐‘˜),

๐ด(

๐‘˜) ๐‘– ๐‘—

โ‰ค ๐ถ๐›พ

q ๐ด(

๐‘˜) ๐‘–๐‘– ๐ด(

๐‘˜)

๐‘— ๐‘— exp(โˆ’๐›พ ๐‘‘(๐‘–, ๐‘—)). (4.21) The matrices๐ด(๐‘˜)are coarse-grained versions of the local operatorLand thus inherit some of its locality in the form of exponential decay. Such exponential localization results were first obtained by [163] for the coarse-grained operators obtained from

Dalam dokumen Inference, Computation, and Games (Halaman 59-66)