In the first part of the thesis, we study multilevel methods for solving challenging PDEs with coarse coefficients and high frequency, where standard FEMs often fail. In the case of PDE problems, these matrices can also include kernel partial derivatives.
LIST OF ILLUSTRATIONS
In the right figure, we study the sparse Cholesky factorization for the reduced kernel matrix K(φk,φk)−1. In the left image, we select data points that will be uniformly distributed in [0,1]2 with different grid sizes; ρ = 4.0.
LIST OF TABLES
INTRODUCTION
Multiscale Numerical Methods
- Prototypical Equations
- Solving PDEs as Function Approximation
- Primal Perspective: Constructing Coarse Spaces
- On Exponential Convergence of Accuracy
- Our Contributions: Helmholtz’s Equations and Non-overlapped Domain Decomposition
- Dual Perspective: Selecting Coarse Variables
- Our Contribution: Subsampled Lengthscale in the Coarse Variables
- Summary
Multiscale methods aim to find better S that can capture the coarse scaling behavior of the solution by incorporating the structure of the equation. We show that there is a non-monotonic behavior in the dependence on the exponential decay rate of h.
Statistical Numerical Methods
- Statistical Inference for Numerical Computation
- Methodology: Solving Nonlinear PDEs and IPs with GPs
- Efficiency: Sparse Cholesky Factorization for GP-PDEs
- Adaptivity: Hierarchical Learning and Consistency Analysis
- Additional Topics: Randomized Numerics and Posterior Sampling This thesis also covers some other related and prospective topics pertaining to nu-
- High Dimensional Problems through Randomized Numerics
- Posterior Sampling through Gradient Flows
- Summary
Based on the analysis, they propose a sparse Cholesky factorization algorithm to factorize the inverse kernel matrix in almost linear time. Indeed, for high-dimensional problems, it makes more sense to aim for a low approximation of the kernel matrix.
EXPONENTIALLY CONVERGENT MULTISCALE FINITE ELEMENT METHOD
Introduction
- Literature for Solving Helmholtz Equations
- Main Contributions and Motivations
- Organization
In studying the solution behavior of the Helmholtz equation (2.1.1), we introduce a fine-scale decomposition of its solution space. Our fine-scale decomposition of the solution space is built on this non-overlapping edge union.
Preliminaries on Helmholtz’s Equation
- Notations
- Analytic Results
The adjoint problem will play a valuable role when we analyze the convergence property of our multiscale methods for the Helmholtz equation. We will need the solution Ca estimates to demonstrate the theoretical properties of our multiscale methods.
Coarse-Fine Scale Decomposition
- Mesh Structure
- Elements
- Nodes, Edges, and Their Neighbors
- Decomposition of Solution Space
- Local decomposition
- Global decomposition
- Local and Small Bubble Part
- Low Complexity of the Helmholtz-Harmonic Part
- Approximation via Edge Functions
- Localization of Approximation
- Local Approximation via Oversampling
- Low Complexity in Approximation
For any T ∈ TH and a function v ∈ H1(T) that vanishes on one of the edges of T, it holds. With the mesh structure defined, we now discuss the coarse-fine scale decomposition of the solution space.
The Multiscale Methods
- The Multiscale Framework
- The Ritz-Galerkin Method
- The Petrov-Galerkin Method
If Stest = S, it is called the Ritz-Galerkin method, otherwise it is called the Petrov-Galerkin method. Our current theory does not address discrete system stability and error estimation H (Ω) for the Petrov-Galerkin method.
Numerical Experiments
- Set-up
- Multiscale Algorithms
- Offline Stage
- A High Wavenumber Example: Planar Wave
- A High Contrast Example: Mie resonances
- An Numerical Example with Mixed Boundary and Rough Field
- Summary
We note that our numerical experiments in the next section suggest that these properties also hold for the Petrov–Galerkin method. Note that here {ψi}xi∈NH are the same as the basis functions in MsFEM. The construction depends on how to choose sample and test spaces in the Galerkin method.
Proofs
- Proof of Proposition 2.2.1
- Proof of Proposition 2.3.7
- Proof of Theorem 2.3.9
- Proof of Theorem 2.3.12
- Geometric Relation: Interior Edges
- Main Idea of the Proof
- Proof of Lemma 2.6.1
- Proof of Lemma 2.6.4
- Proof of Lemma 2.6.5
- Proof of Lemma 2.6.2
- For Edges Connected to the Boundary
- Proof of Proposition 2.3.14
- Proof of Theorem 2.4.3
We will explain the proof for inner edges in detail and comment on the changes to be made for edges connected to the boundary. For edges connected to the boundary, we need a different geometric relationship, as shown on the right in Figure 2.9. According to the discussion in Note 2.6.3, the local Ca constant is independent of k for edges connected to the boundary.
Conclusions
ANALYSIS OF SUBSAMPLED LENGTHSCALES IN MULTISCALE METHODS
Introduction
- Problem 1: Numerical Upscaling
- Problem 2: Scattered Data Approximation
- A Common Approach
- Our Goals
- Subsampled Lengthscales
- Basis Functions and Localization
- Our Contributions
- Related Works
- Numerical Upscaling
- Function Approximation
- Organization
The computational costs of the two solutions are different: the former requires only the basis function to be solved, while the latter also requires the solution of a scaled-up equation. In the first part of this work, we consider the finite regime of the undersampled length scale, i.e. it is a strictly positive number. A major component in LOD and Gamblets is the localization problem: the ideal multiscale basis functions must be localized for efficient calculations.
Finite Regime of Subsampled Lengthscales
- Experiments: Ideal Solution
- One Dimensional Example
- Two Dimensional Example
- Analysis: Ideal Solution
- Experiments: Localized Solution
- One Dimensional Example
- Two Dimensional Example
- Analysis: Localized Solution
- Notations
- Analysis
- Proof Strategy
Furthermore, the relative behavior of the three h/H cases is very similar to that of the ideal solution, indicating that the localization error when l = 4 can be small compared to the approximation error of the ideal solution. When d = 1, both parts of the errors remain bounded at h → 0, so the competition is less pronounced; this agrees with what we observed in our 1D experiments – the reduction effect is not as large as in our 2D case. The above phenomenon also applies to other errors, i.e. recoverL2error errors e0h,H,l(a,u) and Galerkin errors ˜e1h,H,l(a,u) and ˜eh,H,l0 (a,u) .
Small Limit Regime of Subsampled Lengthscales
- Numerical Experiment
- Analysis: Weighted Inequality
It is straightforward to go from overall localization error to the energy recovery error of a triangle inequality. The energy Galerkin error is the upper bound of the energy recovery error according to the Galerkin orthogonality. Moreover, it is visually smoother; due to the one-weight function, the effect of the subsampled data propagates to other points in the domain.
Proofs
- Proof of Theorem 3.2.3
- Inverse Estimate
- Norm Estimate
- Localization Per Basis Function
- Overall Localization Error
- Overall Galerkin Error
- Proof of Theorem 3.3.2 We start with the first case, i.e., k u k H 1
In the above inequality, we used the gradient limit of η, the subsampled Poincaré inequality (because of the property [χih,H, φh,Hj ]= 0). Combining the above estimate with the result in the last subsection (note that the cardinality of I is 1/Hd), we get. The L2 error estimate is obtained by using the standard Aubin-Nitsche trick in finite element theory.
Conclusions
- Summary
- Discussions
- Conclusions
A better understanding of the trade-off regarding h and l: how to choose optimally and appropriately regarding f. It is important to study the effect of h,l and also the order of the operatorL simultaneously on the recovery and Galerkin errors. How will we take the advantages of the PDE model inΩ1 and the measured data inΩ2 to recover an accurate.
GAUSSIAN PROCESSES FOR SOLVING AND LEARNING PDES AND INVERSE PROBLEMS
Introduction
- Summary of the Proposed Method
- Optimal Recovery
- Finite-Dimensional Representation
- Numerical Framework
- Relevant Literature
- Outline
We first present this result in the case of the nonlinear PDE (4.1.1) and defer a more general version to subsection 4.3.2. We note that this convergence theorem requires K to fit into the solution space of the PDE so that u belongs to U. Advanced linear solvers for dense kernel matrices can be used for this step.
Conditioning GPs on Nonlinear Observations
- GPs and Banach Spaces Endowed with a Quadratic Norm
A special case of the setting considered here is U = H0s(Ω)(we write H0s(Ω) for the closure of the set of smooth functions with a compact support in Ω with respect to the Sobolev norm k · kHs(Ω)), with its dual Ut = H−s(Ω ), defined by the pairing [φ,v]. Proposition 4.2.1 gives the final representation of the conditional mean of GP, which represents the representative theorem [204, Cor. It is intuitive that the minimizer of the optimization problem we introduce and solve in this work corresponds to the MAP point for GP ξ ∼ N (0,K).
Solving Nonlinear PDEs
- Problem Setup
- Convergence Theory
- Dealing with the Constraints
- Eliminating the Equality Constraints
- Relaxing the Equality Constraints
- Implementation
- Constructing Θ
- A Gauss–Newton Algorithm
- Computational Bottlenecks
- Numerical Experiments for Nonlinear PDEs
- A Nonlinear Elliptic PDE
- Burgers’ Equation
- Eikonal PDE
The choice of random collocation points was made to emphasize the flexibility of our methodology. Finally, we note that the accuracy of our method is closely related to the choice. An example of these collocation points along with contours of the true solution is shown in Figure 4.4(a).
Solving Inverse Problems
- Problem Setup
- Dealing with the Constraints
- Implementation
- Numerical Experiments for Darcy Flow
In the light of Remark 4.2.4, we note that (4.4.4) corresponds to the introduction of the prior measure, which assumes that they are a priori independent. Removing restrictions as in subsection 4.3.3.1 is a bit more delicate, but is sometimes possible. Both problems (4.4.11) and (4.4.12) can be solved using the same techniques described in subsection 4.3.4, except that now we have a higher-dimensional solution space.
Conclusions
SPARSE CHOLESKY FACTORIZATION FOR SOLVING PDES VIA GAUSSIAN PROCESSES
Introduction
- The problem
- Contributions
- Related work
- Machine learning PDEs
- Fast solvers for kernel matrices
- Screening effects in spatial statistics
In the case of PDE problems, these matrices may also include partial derivatives of kernels [43], and fast algorithms for such matrices are less developed compared to instances without derivatives. Most existing methods focus on the case where Θ includes only point values of the kernel function. A fundamental question is how the screening effect behaves when derived spatial field information is included and how to use it to extend Cholesky sparse factorization methods to kernel matrices containing kernel derivatives.
Solving Nonlinear PDEs via GPs
- The GP framework
- The finite dimensional problem
- The general case
This algorithm is designed for kernel matrices with entries without derivatives and is related to the screening effect in spatial statistics [259, 256]. Here, k · k is the norm of the reproduction kernel Hilbert space (RKHS) corresponding to the kernel/covariance function K. As for consistency, when K is sufficiently regular, the above solution will converge to the exact PDE solution as MΩ,M∂Ω → ∞ ; see Theorem 1.2 in [43].
The Sparse Cholesky Factorization Algorithm
- The case of derivative-free measurements
- Reordering
- Sparsity pattern
- KL minimization
- The case of derivative measurements
- The nonlinear elliptic PDE case
- General case
We introduce the operator P: I → I to match the order of the measurements with the index of the corresponding points, i.e. P(q)= iq. Now that the order has been determined, our next step is to identify the sparseness pattern of the Cholesky factor below the order. The complexity is of the same order as in Note 5.3.8; the hidden constant in the complexity estimate depends on J, the maximum order of the derived measures.
Theoretical Study
- Set-up for rigorous results
- Theory
The presence of these Dirac measurements is the key to obtaining provable guarantees of the algorithm; for details see section 5.4. Recall that for this P, we first order the Dirac measurements using the maximum order conditional on ∂Ω (since there are no boundary points); then, we follow the ordering with an arbitrary ordering of the derivative measures. Our technical innovation is to relate this normalized term to the conditional expectation of the GP, leading to the identity.
Second Order Optimization Methods
- Gauss-Newton iterations
- Sparse Cholesky factorization for the reduced kernel matrices
- General case
Suppose that Θ is the reordered version of the reduced kernel matrixK(φk,φk), then similar to Figures 5.1 and 5.2, we show the magnitude of the corresponding Cholesky factor of Θ−1 = U?U?T, i.e. , we graph |Ui j?|fori ≤ j; here j is chosen to correspond to some boundary and interior points. This also implies that the presence of intrinsic Dirac measurements is the key to the sparse Choleksy factors for the K(φ,˜ φ˜)-1 prior. On the right of Figure 5.4, we show the KL errors of the resulting factorization with respect to the sparse parameter.
Numerical Experiments
- Nonlinear elliptic PDEs
- Burgers’ equation
- Monge-Ampère equation
The left figure is about the L2 errors of the solution, while the right figure is about the CPU time. On the right of Figure 5.8, a nearly linear complexity in time with respect to the number of points is demonstrated. On the right of Figure 5.9, we show the CPU time of our algorithm with respect to different Ndomain.
Conclusions
Consequently, the total CPU time is longer compared to the previous examples, although the scaling regarding Ndomain remains the same. However, since we do not incorporate singularity into the solution, this example may not correspond to the most challenging setting.
CONSISTENCY OF HIERARCHICAL LEARNING FOR GAUSSIAN PROCESSES
Introduction
- Background and Context
- Gaussian Process Regression
- Two Approaches
- Empirical Bayes Approach
- Approximation Theoretic Approach
- Guiding Observations and Goals
- Our Contributions
- Consistency and Implicit Bias
This measures the discrepancy in the RKHS rate between the GPR solution using all the X data and using a subset of the πX data, normalized by the RKHS rate of the former. As explained above, we understand the numerator as an error estimate where†−u(·, θ,X)kK2. The existence of the finite sample formula (6.1.6) is attributed to the choice of the RKHS norm in the comparison of solutions.