Smooth Gaussian processes - Elliptic partial differential equations and smooth Gaussian process

Chapter II: Elliptic partial differential equations and smooth Gaussian processes 13

2.2 Smooth Gaussian processes

2.2.1 Gaussian vectors

A Gaussian random vector 𝑋 ∼ N (𝜇,Θ) with mean 𝜇 ∈ R^𝑁 and Θ ∈ R^𝑁^×^𝑁 symmetric positive definite is a random element ofR^𝑁 that is distributed according to the probability density (with respect to the Lebesgue measure)

𝑝_{N (}_𝜇,_Θ)(𝑋) B 1

p(2𝜋)^𝑁det(Θ) exp

−1

2 (𝑋−𝜇)^>Θ⁻¹(𝑋−𝜇)

. (2.23) Gaussian vectors are an immensely popular modeling tool for multivariate data.

They can be motivated by a variety of ways including rotational invariance (with respect to the inner product induced by Θ⁻¹) [132, Chapter 13], the central limit theorem [132, Chapter 5], and even game theory [103, 190]. Beyond these theo- retical considerations, they have the computational benefit that most probabilistic operations on Gaussian vectors can be characterized in terms of linear algebraic operations onΘand𝜇.

1. The mean and covariance of 𝑋 ∼ N (𝜇,Θ) are given by 𝜇and Θ. We thus refer toΘas the covariance matrix ofN (𝜇,Θ).

2. The marginal log-likelihood of the Gaussian process model given data 𝑦 is given as

− 1

2 (𝑦−𝜇)^>Θ⁻¹(𝑦− 𝜇) − 1

2logdet(Θ) − 𝑁

2 log(2𝜋). (2.24) 3. Writing𝑋 =

𝑋₁ 𝑋₂

∈R^𝑁¹^+𝑁², and blocking𝜇andΘaccordingly, the distribution of 𝑋₂conditioned on 𝑋₁is given as

𝑋₂ | 𝑋₁ ∼ N

𝜇₂+Θ₂,1 Θ₁,1−1

(𝑋₁−𝜇₁),Θ₂,2−Θ₂,1 Θ₁,1−1

Θ₂,1

. (2.25) 4. The conditional correllations of 𝑋 are encoded in theprecision 𝐴 B Θ⁻¹, in

that

𝐴_{𝑖 𝑗} p

𝐴_𝑖𝑖𝐴_{𝑗 𝑗}

= (−1)^𝑖^≠^𝑗 Cov

𝑋_𝑖, 𝑋_𝑗 | 𝑋_∉{_{𝑖, 𝑗}_} q

Var

𝑋_𝑖 | 𝑋_∉{_{𝑖, 𝑗}_} Var

𝑋_𝑗 | 𝑋_∉{_{𝑖, 𝑗}_}

, (2.26) where∉{𝑖, 𝑗} denotes the set{1, . . . 𝑁} \ {𝑖, 𝑗}.

2.2.2 Gaussian processes

A common setting in Gaussian process statistics is such that we observe data𝑦_Tr ∈ R^𝑁^Tr at 𝑁_Tr training locations in R^𝑑 and choose a covariance matrix Θ_Tr,Tr that explains 𝑦_Tr, for instance by maximizing the marginal likelihood2ofN 0,Θ_Tr,Tr

given𝑦_Tr. If we want to use this data to predict data at a different set of𝑁_Prprediction locations, we need not onlyΘ_Tr,Trbut also the training-prediction covarianceΘ_Tr,Pr, (and Θ_Pr,Pr if we want to perform uncertainty quantification). Without observing data from the prediction locations, there seems to be no way for us to decide which Θ_Tr,Pr,Θ_Pr,Prto choose.

In order to define covariances between arbitrary locations, we can model our data 𝑦_Tr as measurements of an infinite-dimensional Gaussian vector assigning a value to each point inR^𝑑. This idea is formalized by the notion of aGaussian field.

Definition 2. Given a separable Banach spaceB and its dualB^∗, let L : B −→

B^∗ and G B L⁻¹ : B^∗ −→ B be symmetric and bounded linear operators.

Let furthermore H be a Hilbert space of univariate Gaussian random variables, equipped with the𝐿²inner product. We call a linear map𝜉 :B^∗ −→Ha Gaussian field with covariance operator G, precision operators L, and mean 𝜇 ∈ B, if it is an affine isometry, meaning that for all𝜙 ∈ B^∗we have

𝜉(𝜙) ∼ N ( [𝜙, 𝜇],[𝜙,G𝜙]). (2.27) Following the notation for Gaussian vectors, we then write𝜉 ∼ 𝑁(𝜇,G). Here,[·,·]

is the duality product ofB^∗ andB. Abusing notation, we write [𝜙, 𝜉] B 𝜉(𝜙).

For finite-dimensionalB,𝜉 can be obtained from a Gaussian vector𝑋 ∼ N (𝜇,G) as 𝜉(𝜙) B [𝜙, 𝑋]. For infinite-dimensional B, a random element 𝑋 ∈ B that realizes the mapping𝜉 : B^∗ −→ Husually does not exist. Instead,𝜉can be realized by a probability measure on a space larger than B, or by a cylinder measure on B itself (see [190][Chapter 17] for additional details). Either way, 𝜉 provides us with a way to assign to any finite collection of measurements {𝜙_𝑖}_1≤_𝑖_≤_𝑁 ∈ B^∗ a joint distribution given by N ( ( [𝜙_𝑖, 𝜇])_1≤_𝑖_≤_𝑁,Θ), where Θ𝑖 𝑗 B [𝜙_𝑖,G𝜙_𝑗]. If B is a subset of the continuous functions onR^𝑑, by choosing the {𝜙_𝑖}_1≤_𝑖_≤_𝑁 ∈ B^∗ as pointwise evaluations in a set of points{𝑥_𝑖}₁_≤𝑖≤_𝑁, we can obtain covariance matrices of the joint distributions of arbitrary combinations of data points. Given training data𝑦_Trin a set of training locations, we can use the maximum likelihood criterion in order to choose a Gaussian field𝜉 ∼ N (0,G). Once found, the Gaussian field𝜉 allows us to perform inference at arbitrary sets of prediction points.

20 2.2.3 Smooth Gaussian processes and elliptic PDE

Even in the setting of Gaussian vectors, we have somewhat brushed over the question of howto choose a covariance model. For Gaussian fields, the space of possible choices is vastly bigger, making it even less obvious how to single out a particular covariance operator for a given task.

The choice of covariance operator G is a modeling choice whereby we assume structure in our data that we can later use to perform inference. One of the most fundamental assumptions on data issmoothness, meaning that the spatial derivatives of the unknown function𝑢are not too large and therefore𝑢does not vary too rapidly as a function fromR^𝑑toR. Restricting our attention to centered Gaussian processes, we canformallyextend the formula Equation2.23 to the Gaussian field setting by writing

𝑝_{N (}₀_,G)(𝑢) ∝exp

−1

2[L𝑢, 𝑢]

. (2.28)

The log-likelihood of a realization 𝑢 decreases, as the quadratic form [L𝑢, 𝑢] increases. This suggests defining Gaussian fields by choosing an L for which [L𝑢, 𝑢] is a measure of the roughness of the function. The elliptic operators of Section2.1.5where chosen as bounded invertible linear operators from 𝐻^𝑠

0(Ω) to 𝐻^−𝑠(Ω). Therefore, their associated quadratic norm is equivalent to the squared Sobolev norm (see for instance [190, Lemma 2.4])

kL⁻¹k⁻¹k𝑢k²

𝐻^𝑠

0(Ω) ≤ [L𝑢, 𝑢] ≤ kL k k𝑢k²

𝐻^𝑠

0(Ω). (2.29)

The Sobolev norms, being the sum of the𝐿²norms of the first𝑠derivatives, provide a natural measure of the roughness of a realizationL. This makes elliptic operators a natural choice for the precision operators of finitely smooth Gaussian fields. They can either be constructed by discretizing the precision operator using a finite element basis (see [155,207] for examples), based on a closed-form of the Green’s operatorG (the most well-known example being the Matérn family of kernels [167,248,249]).

We point out that the popularGaussiankernel is not a Green’s function of an elliptic PDE, but of a parabolic PDE corresponding to infinite order smoothness or𝑠 → ∞. We also note that many finitely smooth Gaussian process models from the literature, such as fractional order Matérn covariances or the “Cauchy class” of covariance functions, do not strictly fit into the framework of Section2.1.5, yet show the same behavior in practice.

Dalam dokumen Inference, Computation, and Games (Halaman 41-44)