• Tidak ada hasil yang ditemukan

Smooth Gaussian processes

Dalam dokumen Inference, Computation, and Games (Halaman 41-44)

Chapter II: Elliptic partial differential equations and smooth Gaussian processes 13

2.2 Smooth Gaussian processes

2.2.1 Gaussian vectors

A Gaussian random vector 𝑋 ∼ N (πœ‡,Θ) with mean πœ‡ ∈ R𝑁 and Θ ∈ R𝑁×𝑁 symmetric positive definite is a random element ofR𝑁 that is distributed according to the probability density (with respect to the Lebesgue measure)

𝑝N (πœ‡,Θ)(𝑋) B 1

p(2πœ‹)𝑁det(Θ) exp

βˆ’1

2 (π‘‹βˆ’πœ‡)>Ξ˜βˆ’1(π‘‹βˆ’πœ‡)

. (2.23) Gaussian vectors are an immensely popular modeling tool for multivariate data.

They can be motivated by a variety of ways including rotational invariance (with respect to the inner product induced by Ξ˜βˆ’1) [132, Chapter 13], the central limit theorem [132, Chapter 5], and even game theory [103, 190]. Beyond these theo- retical considerations, they have the computational benefit that most probabilistic operations on Gaussian vectors can be characterized in terms of linear algebraic operations onΘandπœ‡.

1. The mean and covariance of 𝑋 ∼ N (πœ‡,Θ) are given by πœ‡and Θ. We thus refer toΘas the covariance matrix ofN (πœ‡,Θ).

2. The marginal log-likelihood of the Gaussian process model given data 𝑦 is given as

βˆ’ 1

2 (π‘¦βˆ’πœ‡)>Ξ˜βˆ’1(π‘¦βˆ’ πœ‡) βˆ’ 1

2logdet(Θ) βˆ’ 𝑁

2 log(2πœ‹). (2.24) 3. Writing𝑋 =

𝑋1 𝑋2

∈R𝑁1+𝑁2, and blockingπœ‡andΘaccordingly, the distribu- tion of 𝑋2conditioned on 𝑋1is given as

𝑋2 | 𝑋1 ∼ N

πœ‡2+Θ2,1 Θ1,1βˆ’1

(𝑋1βˆ’πœ‡1),Θ2,2βˆ’Ξ˜2,1 Θ1,1βˆ’1

Θ2,1

. (2.25) 4. The conditional correllations of 𝑋 are encoded in theprecision 𝐴 B Ξ˜βˆ’1, in

that

𝐴𝑖 𝑗 p

𝐴𝑖𝑖𝐴𝑗 𝑗

= (βˆ’1)𝑖≠𝑗 Cov

𝑋𝑖, 𝑋𝑗 | π‘‹βˆ‰{𝑖, 𝑗} q

Var

𝑋𝑖 | π‘‹βˆ‰{𝑖, 𝑗} Var

𝑋𝑗 | π‘‹βˆ‰{𝑖, 𝑗}

, (2.26) whereβˆ‰{𝑖, 𝑗} denotes the set{1, . . . 𝑁} \ {𝑖, 𝑗}.

2.2.2 Gaussian processes

A common setting in Gaussian process statistics is such that we observe data𝑦Tr ∈ R𝑁Tr at 𝑁Tr training locations in R𝑑 and choose a covariance matrix ΘTr,Tr that explains 𝑦Tr, for instance by maximizing the marginal likelihood2ofN 0,ΘTr,Tr

given𝑦Tr. If we want to use this data to predict data at a different set of𝑁Prprediction locations, we need not onlyΘTr,Trbut also the training-prediction covarianceΘTr,Pr, (and ΘPr,Pr if we want to perform uncertainty quantification). Without observing data from the prediction locations, there seems to be no way for us to decide which ΘTr,Pr,ΘPr,Prto choose.

In order to define covariances between arbitrary locations, we can model our data 𝑦Tr as measurements of an infinite-dimensional Gaussian vector assigning a value to each point inR𝑑. This idea is formalized by the notion of aGaussian field.

Definition 2. Given a separable Banach spaceB and its dualBβˆ—, let L : B βˆ’β†’

Bβˆ— and G B Lβˆ’1 : Bβˆ— βˆ’β†’ B be symmetric and bounded linear operators.

Let furthermore H be a Hilbert space of univariate Gaussian random variables, equipped with the𝐿2inner product. We call a linear mapπœ‰ :Bβˆ— βˆ’β†’Ha Gaussian field with covariance operator G, precision operators L, and mean πœ‡ ∈ B, if it is an affine isometry, meaning that for allπœ™ ∈ Bβˆ—we have

πœ‰(πœ™) ∼ N ( [πœ™, πœ‡],[πœ™,Gπœ™]). (2.27) Following the notation for Gaussian vectors, we then writeπœ‰ ∼ 𝑁(πœ‡,G). Here,[Β·,Β·]

is the duality product ofBβˆ— andB. Abusing notation, we write [πœ™, πœ‰] B πœ‰(πœ™).

For finite-dimensionalB,πœ‰ can be obtained from a Gaussian vector𝑋 ∼ N (πœ‡,G) as πœ‰(πœ™) B [πœ™, 𝑋]. For infinite-dimensional B, a random element 𝑋 ∈ B that realizes the mappingπœ‰ : Bβˆ— βˆ’β†’ Husually does not exist. Instead,πœ‰can be realized by a probability measure on a space larger than B, or by a cylinder measure on B itself (see [190][Chapter 17] for additional details). Either way, πœ‰ provides us with a way to assign to any finite collection of measurements {πœ™π‘–}1≀𝑖≀𝑁 ∈ Bβˆ— a joint distribution given by N ( ( [πœ™π‘–, πœ‡])1≀𝑖≀𝑁,Θ), where Ξ˜π‘– 𝑗 B [πœ™π‘–,Gπœ™π‘—]. If B is a subset of the continuous functions onR𝑑, by choosing the {πœ™π‘–}1≀𝑖≀𝑁 ∈ Bβˆ— as pointwise evaluations in a set of points{π‘₯𝑖}1≀𝑖≀𝑁, we can obtain covariance matrices of the joint distributions of arbitrary combinations of data points. Given training data𝑦Trin a set of training locations, we can use the maximum likelihood criterion in order to choose a Gaussian fieldπœ‰ ∼ N (0,G). Once found, the Gaussian fieldπœ‰ allows us to perform inference at arbitrary sets of prediction points.

20 2.2.3 Smooth Gaussian processes and elliptic PDE

Even in the setting of Gaussian vectors, we have somewhat brushed over the question of howto choose a covariance model. For Gaussian fields, the space of possible choices is vastly bigger, making it even less obvious how to single out a particular covariance operator for a given task.

The choice of covariance operator G is a modeling choice whereby we assume structure in our data that we can later use to perform inference. One of the most fundamental assumptions on data issmoothness, meaning that the spatial derivatives of the unknown function𝑒are not too large and therefore𝑒does not vary too rapidly as a function fromR𝑑toR. Restricting our attention to centered Gaussian processes, we canformallyextend the formula Equation2.23 to the Gaussian field setting by writing

𝑝N (0,G)(𝑒) ∝exp

βˆ’1

2[L𝑒, 𝑒]

. (2.28)

The log-likelihood of a realization 𝑒 decreases, as the quadratic form [L𝑒, 𝑒] increases. This suggests defining Gaussian fields by choosing an L for which [L𝑒, 𝑒] is a measure of the roughness of the function. The elliptic operators of Section2.1.5where chosen as bounded invertible linear operators from 𝐻𝑠

0(Ξ©) to π»βˆ’π‘ (Ξ©). Therefore, their associated quadratic norm is equivalent to the squared Sobolev norm (see for instance [190, Lemma 2.4])

kLβˆ’1kβˆ’1k𝑒k2

𝐻𝑠

0(Ξ©) ≀ [L𝑒, 𝑒] ≀ kL k k𝑒k2

𝐻𝑠

0(Ξ©). (2.29)

The Sobolev norms, being the sum of the𝐿2norms of the first𝑠derivatives, provide a natural measure of the roughness of a realizationL. This makes elliptic operators a natural choice for the precision operators of finitely smooth Gaussian fields. They can either be constructed by discretizing the precision operator using a finite element basis (see [155,207] for examples), based on a closed-form of the Green’s operatorG (the most well-known example being the MatΓ©rn family of kernels [167,248,249]).

We point out that the popularGaussiankernel is not a Green’s function of an elliptic PDE, but of a parabolic PDE corresponding to infinite order smoothness or𝑠 β†’ ∞. We also note that many finitely smooth Gaussian process models from the literature, such as fractional order MatΓ©rn covariances or the β€œCauchy class” of covariance functions, do not strictly fit into the framework of Section2.1.5, yet show the same behavior in practice.

Dalam dokumen Inference, Computation, and Games (Halaman 41-44)