Chapter II: Elliptic partial differential equations and smooth Gaussian processes 13
2.2 Smooth Gaussian processes
2.2.1 Gaussian vectors
A Gaussian random vector π βΌ N (π,Ξ) with mean π β Rπ and Ξ β RπΓπ symmetric positive definite is a random element ofRπ that is distributed according to the probability density (with respect to the Lebesgue measure)
πN (π,Ξ)(π) B 1
p(2π)πdet(Ξ) exp
β1
2 (πβπ)>Ξβ1(πβπ)
. (2.23) Gaussian vectors are an immensely popular modeling tool for multivariate data.
They can be motivated by a variety of ways including rotational invariance (with respect to the inner product induced by Ξβ1) [132, Chapter 13], the central limit theorem [132, Chapter 5], and even game theory [103, 190]. Beyond these theo- retical considerations, they have the computational benefit that most probabilistic operations on Gaussian vectors can be characterized in terms of linear algebraic operations onΞandπ.
1. The mean and covariance of π βΌ N (π,Ξ) are given by πand Ξ. We thus refer toΞas the covariance matrix ofN (π,Ξ).
2. The marginal log-likelihood of the Gaussian process model given data π¦ is given as
β 1
2 (π¦βπ)>Ξβ1(π¦β π) β 1
2logdet(Ξ) β π
2 log(2π). (2.24) 3. Writingπ =
π1 π2
βRπ1+π2, and blockingπandΞaccordingly, the distribu- tion of π2conditioned on π1is given as
π2 | π1 βΌ N
π2+Ξ2,1 Ξ1,1β1
(π1βπ1),Ξ2,2βΞ2,1 Ξ1,1β1
Ξ2,1
. (2.25) 4. The conditional correllations of π are encoded in theprecision π΄ B Ξβ1, in
that
π΄π π p
π΄πππ΄π π
= (β1)πβ π Cov
ππ, ππ | πβ{π, π} q
Var
ππ | πβ{π, π} Var
ππ | πβ{π, π}
, (2.26) whereβ{π, π} denotes the set{1, . . . π} \ {π, π}.
2.2.2 Gaussian processes
A common setting in Gaussian process statistics is such that we observe dataπ¦Tr β RπTr at πTr training locations in Rπ and choose a covariance matrix ΞTr,Tr that explains π¦Tr, for instance by maximizing the marginal likelihood2ofN 0,ΞTr,Tr
givenπ¦Tr. If we want to use this data to predict data at a different set ofπPrprediction locations, we need not onlyΞTr,Trbut also the training-prediction covarianceΞTr,Pr, (and ΞPr,Pr if we want to perform uncertainty quantification). Without observing data from the prediction locations, there seems to be no way for us to decide which ΞTr,Pr,ΞPr,Prto choose.
In order to define covariances between arbitrary locations, we can model our data π¦Tr as measurements of an infinite-dimensional Gaussian vector assigning a value to each point inRπ. This idea is formalized by the notion of aGaussian field.
Definition 2. Given a separable Banach spaceB and its dualBβ, let L : B ββ
Bβ and G B Lβ1 : Bβ ββ B be symmetric and bounded linear operators.
Let furthermore H be a Hilbert space of univariate Gaussian random variables, equipped with theπΏ2inner product. We call a linear mapπ :Bβ ββHa Gaussian field with covariance operator G, precision operators L, and mean π β B, if it is an affine isometry, meaning that for allπ β Bβwe have
π(π) βΌ N ( [π, π],[π,Gπ]). (2.27) Following the notation for Gaussian vectors, we then writeπ βΌ π(π,G). Here,[Β·,Β·]
is the duality product ofBβ andB. Abusing notation, we write [π, π] B π(π).
For finite-dimensionalB,π can be obtained from a Gaussian vectorπ βΌ N (π,G) as π(π) B [π, π]. For infinite-dimensional B, a random element π β B that realizes the mappingπ : Bβ ββ Husually does not exist. Instead,πcan be realized by a probability measure on a space larger than B, or by a cylinder measure on B itself (see [190][Chapter 17] for additional details). Either way, π provides us with a way to assign to any finite collection of measurements {ππ}1β€πβ€π β Bβ a joint distribution given by N ( ( [ππ, π])1β€πβ€π,Ξ), where Ξπ π B [ππ,Gππ]. If B is a subset of the continuous functions onRπ, by choosing the {ππ}1β€πβ€π β Bβ as pointwise evaluations in a set of points{π₯π}1β€πβ€π, we can obtain covariance matrices of the joint distributions of arbitrary combinations of data points. Given training dataπ¦Trin a set of training locations, we can use the maximum likelihood criterion in order to choose a Gaussian fieldπ βΌ N (0,G). Once found, the Gaussian fieldπ allows us to perform inference at arbitrary sets of prediction points.
20 2.2.3 Smooth Gaussian processes and elliptic PDE
Even in the setting of Gaussian vectors, we have somewhat brushed over the question of howto choose a covariance model. For Gaussian fields, the space of possible choices is vastly bigger, making it even less obvious how to single out a particular covariance operator for a given task.
The choice of covariance operator G is a modeling choice whereby we assume structure in our data that we can later use to perform inference. One of the most fundamental assumptions on data issmoothness, meaning that the spatial derivatives of the unknown functionπ’are not too large and thereforeπ’does not vary too rapidly as a function fromRπtoR. Restricting our attention to centered Gaussian processes, we canformallyextend the formula Equation2.23 to the Gaussian field setting by writing
πN (0,G)(π’) βexp
β1
2[Lπ’, π’]
. (2.28)
The log-likelihood of a realization π’ decreases, as the quadratic form [Lπ’, π’] increases. This suggests defining Gaussian fields by choosing an L for which [Lπ’, π’] is a measure of the roughness of the function. The elliptic operators of Section2.1.5where chosen as bounded invertible linear operators from π»π
0(Ξ©) to π»βπ (Ξ©). Therefore, their associated quadratic norm is equivalent to the squared Sobolev norm (see for instance [190, Lemma 2.4])
kLβ1kβ1kπ’k2
π»π
0(Ξ©) β€ [Lπ’, π’] β€ kL k kπ’k2
π»π
0(Ξ©). (2.29)
The Sobolev norms, being the sum of theπΏ2norms of the firstπ derivatives, provide a natural measure of the roughness of a realizationL. This makes elliptic operators a natural choice for the precision operators of finitely smooth Gaussian fields. They can either be constructed by discretizing the precision operator using a finite element basis (see [155,207] for examples), based on a closed-form of the Greenβs operatorG (the most well-known example being the MatΓ©rn family of kernels [167,248,249]).
We point out that the popularGaussiankernel is not a Greenβs function of an elliptic PDE, but of a parabolic PDE corresponding to infinite order smoothness orπ β β. We also note that many finitely smooth Gaussian process models from the literature, such as fractional order MatΓ©rn covariances or the βCauchy classβ of covariance functions, do not strictly fit into the framework of Section2.1.5, yet show the same behavior in practice.