PDEs and Inverse Problems

In the first part of the thesis, we study multilevel methods for solving challenging PDEs with coarse coefficients and high frequency, where standard FEMs often fail. In the case of PDE problems, these matrices can also include kernel partial derivatives.

LIST OF ILLUSTRATIONS

In the right figure, we study the sparse Cholesky factorization for the reduced kernel matrix K(φk,φk)−1. In the left image, we select data points that will be uniformly distributed in [0,1]2 with different grid sizes; ρ = 4.0.

LIST OF TABLES

INTRODUCTION

Multiscale Numerical Methods

Prototypical Equations
Solving PDEs as Function Approximation
Primal Perspective: Constructing Coarse Spaces

On Exponential Convergence of Accuracy
Our Contributions: Helmholtz’s Equations and Non-overlapped Domain Decomposition

Dual Perspective: Selecting Coarse Variables

Our Contribution: Subsampled Lengthscale in the Coarse Variables

Summary

Multiscale methods aim to find better S that can capture the coarse scaling behavior of the solution by incorporating the structure of the equation. We show that there is a non-monotonic behavior in the dependence on the exponential decay rate of h.

Figure 1.1: Illustration of Subsampled Measurements: H = 1/4, h = 1/10

Statistical Numerical Methods

Statistical Inference for Numerical Computation
Methodology: Solving Nonlinear PDEs and IPs with GPs
Efficiency: Sparse Cholesky Factorization for GP-PDEs
Adaptivity: Hierarchical Learning and Consistency Analysis
Additional Topics: Randomized Numerics and Posterior Sampling This thesis also covers some other related and prospective topics pertaining to nu-

High Dimensional Problems through Randomized Numerics
Posterior Sampling through Gradient Flows

Summary

Based on the analysis, they propose a sparse Cholesky factorization algorithm to factorize the inverse kernel matrix in almost linear time. Indeed, for high-dimensional problems, it makes more sense to aim for a low approximation of the kernel matrix.

EXPONENTIALLY CONVERGENT MULTISCALE FINITE ELEMENT METHOD

Introduction

Literature for Solving Helmholtz Equations
Main Contributions and Motivations
Organization

In studying the solution behavior of the Helmholtz equation (2.1.1), we introduce a fine-scale decomposition of its solution space. Our fine-scale decomposition of the solution space is built on this non-overlapping edge union.

Preliminaries on Helmholtz’s Equation

Notations
Analytic Results

The adjoint problem will play a valuable role when we analyze the convergence property of our multiscale methods for the Helmholtz equation. We will need the solution Ca estimates to demonstrate the theoretical properties of our multiscale methods.

Coarse-Fine Scale Decomposition

Mesh Structure

Elements
Nodes, Edges, and Their Neighbors

Decomposition of Solution Space

Local decomposition
Global decomposition

Local and Small Bubble Part
Low Complexity of the Helmholtz-Harmonic Part

Approximation via Edge Functions
Localization of Approximation
Local Approximation via Oversampling
Low Complexity in Approximation

For any T ∈ TH and a function v ∈ H1(T) that vanishes on one of the edges of T, it holds. With the mesh structure defined, we now discuss the coarse-fine scale decomposition of the solution space.

The Multiscale Methods

The Multiscale Framework
The Ritz-Galerkin Method
The Petrov-Galerkin Method

If Stest = S, it is called the Ritz-Galerkin method, otherwise it is called the Petrov-Galerkin method. Our current theory does not address discrete system stability and error estimation H (Ω) for the Petrov-Galerkin method.

Numerical Experiments

Set-up
Multiscale Algorithms

Offline Stage

A High Wavenumber Example: Planar Wave
A High Contrast Example: Mie resonances
An Numerical Example with Mixed Boundary and Rough Field
Summary

We note that our numerical experiments in the next section suggest that these properties also hold for the Petrov–Galerkin method. Note that here {ψi}xi∈NH are the same as the basis functions in MsFEM. The construction depends on how to choose sample and test spaces in the Galerkin method.

Figure 2.4: Numerical results for the high wavenumber example. Left: e H versus m; right: e L 2 versus m.

Proofs

Proof of Proposition 2.2.1
Proof of Theorem 2.3.9

Geometric Relation: Interior Edges
Main Idea of the Proof
Proof of Lemma 2.6.1
For Edges Connected to the Boundary

We will explain the proof for inner edges in detail and comment on the changes to be made for edges connected to the boundary. For edges connected to the boundary, we need a different geometric relationship, as shown on the right in Figure 2.9. According to the discussion in Note 2.6.3, the local Ca constant is independent of k for edges connected to the boundary.

Figure 2.9: Geometric relation e ⊂ ω ⊂ ω ∗ ⊂ ω e

Conclusions

ANALYSIS OF SUBSAMPLED LENGTHSCALES IN MULTISCALE METHODS

Introduction

Problem 1: Numerical Upscaling
Problem 2: Scattered Data Approximation
A Common Approach
Our Goals
Subsampled Lengthscales
Basis Functions and Localization
Our Contributions
Related Works

Numerical Upscaling
Function Approximation

Organization

The computational costs of the two solutions are different: the former requires only the basis function to be solved, while the latter also requires the solution of a scaled-up equation. In the first part of this work, we consider the finite regime of the undersampled length scale, i.e. it is a strictly positive number. A major component in LOD and Gamblets is the localization problem: the ideal multiscale basis functions must be localized for efficient calculations.

Figure 3.1: Illustration of Subsampled Data: H = 1 / 4, h = 1 / 10

Finite Regime of Subsampled Lengthscales

Experiments: Ideal Solution

One Dimensional Example
Two Dimensional Example

Analysis: Ideal Solution
Experiments: Localized Solution

One Dimensional Example
Two Dimensional Example

Analysis: Localized Solution

Notations
Analysis
Proof Strategy

Furthermore, the relative behavior of the three h/H cases is very similar to that of the ideal solution, indicating that the localization error when l = 4 can be small compared to the approximation error of the ideal solution. When d = 1, both parts of the errors remain bounded at h → 0, so the competition is less pronounced; this agrees with what we observed in our 1D experiments – the reduction effect is not as large as in our 2D case. The above phenomenon also applies to other errors, i.e. recoverL2error errors e0h,H,l(a,u) and Galerkin errors ˜e1h,H,l(a,u) and ˜eh,H,l0 (a,u) .

Figure 3.2: 1D example, ideal solution. Upper left: a ( x ); upper right: f ( x ); lower left: energy error; lower right: L 2 error.

Small Limit Regime of Subsampled Lengthscales

Numerical Experiment
Analysis: Weighted Inequality

It is straightforward to go from overall localization error to the energy recovery error of a triangle inequality. The energy Galerkin error is the upper bound of the energy recovery error according to the Galerkin orthogonality. Moreover, it is visually smoother; due to the one-weight function, the effect of the subsampled data propagates to other points in the domain.

Figure 3.9: Upper left: u ( x ); upper right: recovery solution, h / H = 1/2 and a(x) = 1; lower left: recovery solution, h/H = 1/2 4 and a(x) = 1; lower right:

Proofs

Inverse Estimate
Norm Estimate
Localization Per Basis Function
Overall Localization Error
Overall Galerkin Error

Proof of Theorem 3.3.2 We start with the first case, i.e., k u k H 1

In the above inequality, we used the gradient limit of η, the subsampled Poincaré inequality (because of the property [χih,H, φh,Hj ]= 0). Combining the above estimate with the result in the last subsection (note that the cardinality of I is 1/Hd), we get. The L2 error estimate is obtained by using the standard Aubin-Nitsche trick in finite element theory.

Conclusions

Summary
Discussions
Conclusions

A better understanding of the trade-off regarding h and l: how to choose optimally and appropriately regarding f. It is important to study the effect of h,l and also the order of the operatorL simultaneously on the recovery and Galerkin errors. How will we take the advantages of the PDE model inΩ1 and the measured data inΩ2 to recover an accurate.

GAUSSIAN PROCESSES FOR SOLVING AND LEARNING PDES AND INVERSE PROBLEMS

Introduction

Summary of the Proposed Method

Optimal Recovery
Finite-Dimensional Representation
Numerical Framework

Relevant Literature
Outline

We first present this result in the case of the nonlinear PDE (4.1.1) and defer a more general version to subsection 4.3.2. We note that this convergence theorem requires K to fit into the solution space of the PDE so that u belongs to U. Advanced linear solvers for dense kernel matrices can be used for this step.

Figure 4.1: L 2 and L ∞ error plots for numerical approximations of u ? , the solution to (4.1.1), as a function of the number of collocation points M

Conditioning GPs on Nonlinear Observations

GPs and Banach Spaces Endowed with a Quadratic Norm

A special case of the setting considered here is U = H0s(Ω)(we write H0s(Ω) for the closure of the set of smooth functions with a compact support in Ω with respect to the Sobolev norm k · kHs(Ω)), with its dual Ut = H−s(Ω ), defined by the pairing [φ,v]. Proposition 4.2.1 gives the final representation of the conditional mean of GP, which represents the representative theorem [204, Cor. It is intuitive that the minimizer of the optimization problem we introduce and solve in this work corresponds to the MAP point for GP ξ ∼ N (0,K).

Solving Nonlinear PDEs

Problem Setup
Convergence Theory
Dealing with the Constraints

Eliminating the Equality Constraints
Relaxing the Equality Constraints

Implementation

Constructing Θ
A Gauss–Newton Algorithm
Computational Bottlenecks

Numerical Experiments for Nonlinear PDEs

A Nonlinear Elliptic PDE
Burgers’ Equation
Eikonal PDE

The choice of random collocation points was made to emphasize the flexibility of our methodology. Finally, we note that the accuracy of our method is closely related to the choice. An example of these collocation points along with contours of the true solution is shown in Figure 4.4(a).

Figure 4.2: Numerical results for the nonlinear elliptic PDE (4.1.1): (a) a sample of collocation points and contours of the true solution; (b) convergence history of the Gauss–Newton algorithm; (c) contours of the solution error

Solving Inverse Problems

Problem Setup
Dealing with the Constraints
Implementation
Numerical Experiments for Darcy Flow

In the light of Remark 4.2.4, we note that (4.4.4) corresponds to the introduction of the prior measure, which assumes that they are a priori independent. Removing restrictions as in subsection 4.3.3.1 is a bit more delicate, but is sometimes possible. Both problems (4.4.11) and (4.4.12) can be solved using the same techniques described in subsection 4.3.4, except that now we have a higher-dimensional solution space.

Figure 4.5: Numerical results for the inverse Darcy flow: (a) an instance of uniformly sampled collocation points and data points; (d) Gauss–Newton iteration history; (b) true a; (e) recovered a; (c) true u; (f) recovered u

Conclusions

SPARSE CHOLESKY FACTORIZATION FOR SOLVING PDES VIA GAUSSIAN PROCESSES

Introduction

The problem
Contributions
Related work

Machine learning PDEs
Fast solvers for kernel matrices
Screening effects in spatial statistics

In the case of PDE problems, these matrices may also include partial derivatives of kernels [43], and fast algorithms for such matrices are less developed compared to instances without derivatives. Most existing methods focus on the case where Θ includes only point values of the kernel function. A fundamental question is how the screening effect behaves when derived spatial field information is included and how to use it to extend Cholesky sparse factorization methods to kernel matrices containing kernel derivatives.

Solving Nonlinear PDEs via GPs

The GP framework
The finite dimensional problem
The general case

This algorithm is designed for kernel matrices with entries without derivatives and is related to the screening effect in spatial statistics [259, 256]. Here, k · k is the norm of the reproduction kernel Hilbert space (RKHS) corresponding to the kernel/covariance function K. As for consistency, when K is sufficiently regular, the above solution will converge to the exact PDE solution as MΩ,M∂Ω → ∞ ; see Theorem 1.2 in [43].

The Sparse Cholesky Factorization Algorithm

The case of derivative-free measurements

Reordering
Sparsity pattern
KL minimization

The case of derivative measurements

The nonlinear elliptic PDE case
General case

We introduce the operator P: I → I to match the order of the measurements with the index of the corresponding points, i.e. P(q)= iq. Now that the order has been determined, our next step is to identify the sparseness pattern of the Cholesky factor below the order. The complexity is of the same order as in Note 5.3.8; the hidden constant in the complexity estimate depends on J, the maximum order of the derived measures.

Figure 5.1: Demonstration of screening effects in the context of Diracs measure- measure-ments using the Matérn kernel with ν = 5/2 and lengthscale 0.3

Theoretical Study

Set-up for rigorous results
Theory

The presence of these Dirac measurements is the key to obtaining provable guarantees of the algorithm; for details see section 5.4. Recall that for this P, we first order the Dirac measurements using the maximum order conditional on ∂Ω (since there are no boundary points); then, we follow the ordering with an arbitrary ordering of the derivative measures. Our technical innovation is to relate this normalized term to the conditional expectation of the GP, leading to the identity.

Second Order Optimization Methods

Gauss-Newton iterations
Sparse Cholesky factorization for the reduced kernel matrices
General case

Suppose that Θ is the reordered version of the reduced kernel matrixK(φk,φk), then similar to Figures 5.1 and 5.2, we show the magnitude of the corresponding Cholesky factor of Θ−1 = U?U?T, i.e. , we graph |Ui j?|fori ≤ j; here j is chosen to correspond to some boundary and interior points. This also implies that the presence of intrinsic Dirac measurements is the key to the sparse Choleksy factors for the K(φ,˜ φ˜)-1 prior. On the right of Figure 5.4, we show the KL errors of the resulting factorization with respect to the sparse parameter.

Figure 5.5: Demonstration of screening effects for the reduced kernel matrix. We choose the Matérn kernel with ν = 5/2; the lengthscale parameter is 0.3

Numerical Experiments

Nonlinear elliptic PDEs
Burgers’ equation
Monge-Ampère equation

The left figure is about the L2 errors of the solution, while the right figure is about the CPU time. On the right of Figure 5.8, a nearly linear complexity in time with respect to the number of points is demonstrated. On the right of Figure 5.9, we show the CPU time of our algorithm with respect to different Ndomain.

Figure 5.7: Nonlinear elliptic PDE example. The left figure concerns the L 2 errors of the solution, while the right figure concerns the CPU time

Conclusions

Consequently, the total CPU time is longer compared to the previous examples, although the scaling regarding Ndomain remains the same. However, since we do not incorporate singularity into the solution, this example may not correspond to the most challenging setting.

CONSISTENCY OF HIERARCHICAL LEARNING FOR GAUSSIAN PROCESSES

Introduction

Background and Context
Gaussian Process Regression
Two Approaches

Empirical Bayes Approach
Approximation Theoretic Approach
Guiding Observations and Goals

Our Contributions

Consistency and Implicit Bias

This measures the discrepancy in the RKHS rate between the GPR solution using all the X data and using a subset of the πX data, normalized by the RKHS rate of the former. As explained above, we understand the numerator as an error estimate where†−u(·, θ,X)kK2. The existence of the finite sample formula (6.1.6) is attributed to the choice of the RKHS norm in the comparison of solutions.