5.3 Bayesian analysis for model calibration
5.3.1 Formulation
As with the previous section, n observed response valuesy = (y1, . . . , yn) are to be used to make inference about the value of a set of simulation inputsθ. The simulation is again repre- sented by the operatorG(θ,s), where the vector of inputsscontains the “scenario-descriptor”
inputs, which may typically represent boundary conditions, initial conditions, geometry, etc.
Kennedy and O’Hagan (2001) term these inputs “variable inputs,” because they take on differ- ent values for different realizations of the system; in classical analysis, they are often referred to as independent variables or covariates.
A probabilistic relationship between the model output,G(θ,s), and the observed data,y, is next postulated. A simple but powerful model is the same relationship that was introduced in Eq. (5.2), namely
yi =G(θ,si) +εi, (5.11)
whereεiis a random variable that can encompass both measurement errors onyiand modeling errors associated with the simulationG(θ,s). The most frequently used assumption for theεi is that they are i.i.dN(0, σ2), which means that the εi are independent, zero-mean Gaussian random variables, with variance σ2. Of course, more complex models may be applied, for instance enforcing a parametric dependence structure among the errors.
Once the probabilistic model is specified, a likelihood function for the unknowns may be developed. As discussed in Chapter II, the likelihood function plays a central role in Bayesian inference. Based on the model defined by Eq. (5.11), the likelihood function for θ is the product ofnnormal probability density functions:
L(θ) = f(d|θ) =
n
Y
i=1
1 σ√
2πexp
"
−(yi−G(θ,si))2 2σ2
#
. (5.12)
wheredis used generically to represent the observed data and in this case simply contains the experimental observationsy. Bayes’ theorem (Eq. (2.1)) is now applied using the likelihood function of Eq. (5.12) along with a prior distribution forθ,π(θ), to come up with a posterior distribution,f(θ |d), which represents the belief aboutθin light of the datad:
f(θ |d)∝π(θ)L(θ). (5.13)
The posterior distribution forθ represents the complete state of knowledge, and may even include effects such as multiple modes, which would represent multiple competing hypotheses about the true (best-fitting) value ofθ. Summary information can be extracted from the poste- rior, including the mean (which is typically taken to be the the “best guess” point estimate) and standard deviation (a representation of the amount of residual uncertainty). It is also possible to extract one or two-dimensional marginal distributions, which simplify visualization of the features of the posterior.
However, as discussed in Chapter II, the posterior distribution can not usually be con- structed analytically, and this will almost certainly not be possible when a complex simulation model appears inside the likelihood function. One of the more popular numerical techniques for constructing the posterior distribution is Markov Chain Monte Carlo (MCMC) simulation, which is discussed in Section 2.3. Unfortunately, though, MCMC simulation requires hundreds of thousands of evaluations of the likelihood function, which in the case of model calibration equates to hundreds of thousands of evaluations of the computer modelG(·,·). For most re- alistic models, this number of evaluations will not be feasible. In such situations, the analyst must usually resort to the use of a more inexpensive surrogate (a.k.a response surface approxi- mation) model. Such a surrogate might involve reduced order modeling (e.g., a coarser mesh)
or data-fit techniques such as Gaussian process (a.k.a kriging) modeling.
This work adopts the approach of using a Gaussian process surrogate to the true simulation.
This particular surrogate modeling approach is attractive here for several reasons:
1. The Gaussian process model is incredibly flexible, and can be used to fit data associated with virtually any functional form.
2. The Gaussian process model is stochastic, thus providing both an estimated response value and an uncertainty associated with that estimate. Conveniently, the Bayesian framework makes it possible to take account of this uncertainty.
3. With regards to fit accuracy, the Gaussian process model has been shown to be compet- itive with most other modern data fit methods, including Bayesian neural networks and Multiple Adaptive Regression Splines (Rasmussen, 1996; Giunta et al., 2006), and it can represent functions with multiple inputs.
For Bayesian model calibration with an expensive simulation, the uncertainty associated with the use of a Gaussian process surrogate can be accounted for through the likelihood func- tion by incorporating the direct variance estimates from the GP. Although a complete Bayesian approach would even treat the parameters governing the GP as objects of Bayesian inference, this approach is rather complicated and is not believed to contribute much to the overall uncer- tainty analysis (Kennedy and O’Hagan, 2001). The approach recommended here is to estimate the parameters governing the GP a priori using the observed simulator outputs, and to treat them as known constants for the remainder of the calibration analysis.
Through the assumptions used for Gaussian process modeling, the response conditional on a set of observed “training points” follows a multivariate normal distribution. For a discrete set of new inputs, this response is characterized by a mean vector and a covariance matrix (see
Eqs. (3.4) through (3.6)). Denote the mean vector and covariance matrix corresponding to the inputs(θ,s1), . . . , (θ,sn)asµGP andΣGP, respectively. It is easy to show that the likelihood function forθis then given by a multivariate normal probability density function (note that the likelihood function of Eq. (5.12) can also be expressed as a multivariate normal probability density, withΣdiagonal):
L(θ) = (2π)−n/2|Σ|−1/2exp
−1
2(y−µGP)TΣ−1(y−µGP)
, (5.14)
whereΣ=σ2I +ΣGP, so that bothµGP andΣdepend onθ.
Simply put, since the uncertainty associated with the surrogate model is independent of the modeling and observation uncertainty captured by theεi, the covariance of the Gaussian process predictions (ΣGP) simply adds to the covariance of the error terms (σ2I). (An excel- lent illustration of the effect on the calibration results of acknowledging the Gaussian process surrogate uncertainty is Figure 6.24 and the accompanying discussion.) As mentioned before, if a more complicated error model is desired (i.e. one in which the errors are not independent of each other),σ2I is replaced by a full covariance matrix.