II. BACKGROUND
II.3 Methods for Surrogate Modeling
performed prior and during construction to ensure consistency for accurate experiment results. The test sections were designated with respect to these material properties as shown in TABLE II.1. The aggregate gradation designations are based on Superpave mix design specifications of the same names. The designations of asphalt content refer to
ยฑ0.7% from optimal binder content based on Superpave volumetrics, which includes the aggregate classification. (35)
TABLE II.1: Original Test Section Designations at WesTrack Experiment
Design Air Void Content
Aggregate Gradation Designation
Fine Fine Plus Coarse
Asphalt Content (%)
Low Opt. High Low Opt. High Low Opt. High
Low: 4% 04 18 12 09/21 23 25
Medium: 8% 02 01/15 14 22 11/19 13 08 05/24 07
High: 12% 03/16 17 10 20 23 06
A number of surrogate modeling approaches are available, each with advantages and disadvantages relating to their success to produce an accurate predictive model.
Surrogate modeling methodologies vary in complexity in computations which can influence the accuracy of the model compared to the true function.
The simplest surrogate model is a regression model where an output variable (Y) is described with respect to input parameters (X) and regression coefficients (ฮฒ). The regression model requires definition of a model form, such as linear or quadratic, where the regression coefficients are determined through an analysis of the residuals. Typical methods, such as least squared error, define the regression coefficients as those that minimize the error in the predictions of the model compared to true output values.
Although regression models are simple to construct, inaccuracy arises when the model form is not close to the true function form. Additionally, functions with many parameters can become complex and the improvement in computational speed of the model compared to the true function reduces.
More advanced methods include polynomial chaos (PC) and radial basis functions (RBF) improve on simple regression models. PC models improve on regression models by replacing the input parameters (x) with Hermite polynomials. An example of a PC model can be expressed as:
๐(๐ฅ) =๐ฝ0+โ๐๐=1๐ฝ๐[ฮp(๐๐)] (II.13)
The input parameters (x) are expressed in the Hermite terms in the standard normal space and are represented as (ฮพ). The Hermite polynomial term (ฮ ) can include any order (p), for example:
ฮ1 = ๐
ฮ2 = ๐2 โ1 (II.14)
The methods of solving for the regression coefficients (ฮฒ) are similar to those for the simpler regression models, but the inclusion of the Hermite polynomials allows for improved model performance. The PC method is highly effective for second and third order models with as many as ten input parameters, but the computation expense beyond this can be prohibitive.
Another advanced technique, an RBF model, is expressed mathematically as:
๐(๐ฅ) =โ๐๐=1๐ ๐ค๐๐๏ฟฝ๐ฅ โ ๐(๐)๏ฟฝ (II.15)
The model output (Y) in an RBF is a weighted sum of the nc basis functions evaluated at the Euclidean distances between the input parameter (x) and the centers of the basis functions (c(i)). Basis functions (ฯ) provide a simplification to the complex model by dividing the model into a family of simpler models. Common applications include multivariable polynomial models and periodic functions such as Fourier models. The
benefit of RBF models is that the estimation of the weights (wi) are computationally cheap and yet the model is capable of emulating highly non-linear functions.
The Gaussian Process (GP) surrogate model is a special form of an RBF model and has been shown to be a very powerful surrogate modeling technique for many engineering applications. GP models are shown to be capable of fitting data for high dimensional problems, on the order of 30-50 input parameters, and are an interpolation method that does not follow a specific functional form. GP models are suitable for approximating any smooth, continuous function, common in many engineering applications.
Construction of a GP model requires decisions for a correlation function and a mean function. The squared-exponential is a commonly utilized correlation function. This form utilizes the following equation:
] ) (
exp[
) , (
1
โ
2=
โ
โ
= n
i
ik ij k i
j x x x
x
c ฮพ
(II.16)
Where ฮพi is a scale factor that must be estimated, xij represents the jth training point at the ith dimension, and xik represents the new prediction point at the ith dimension. The terms are summed over the number of training points, n. The correlation function is utilized to construct a correlation matrix, R:
๏ฃบ๏ฃบ
๏ฃบ
๏ฃป
๏ฃน
๏ฃฏ๏ฃฏ
๏ฃฏ
๏ฃฐ
๏ฃฎ
=
) , ( )
, (
) , ( )
, (
1 1
1 1 1
1
nk nj n k
nk j k
x x c x
x c
x x c x
x c R
๏
๏
๏
๏
๏
(II.17)
The covariance function, indicating the covariance between the observed model response values of the training data, ๐(๐ฅ๐), and the predicted responses, ๐(๐ฅ๐), is represented as a function of the correlation matrix, R, and variance as shown here:
R x
Y x Y
Cov( ( j), ( k))=ฯ2 (II.18)
The variance term in Eq. II.18 is another parameter of the GP model that must be estimated. A mean function is also required for construction of the surrogate model. A common constant function form is shown in Equation II.19.
๏ + +
+
= 0 1 1 2 2 )
(x ฮฒ ฮฒ x ฮฒ x
ยต (II.19)
The vector,ฮฒ, is the final parameter that must be estimated to complete the construction process. Once the model form has been selected, the model parameters (mean ๐, variance ๐2, and correlation length-scale factors ๐) must be estimated. The process of parameter
estimation is commonly performed utilizing a maximum likelihood estimation method.
The procedure takes the form of an optimization problem. To avoid common complications due to ill-conditioned matrices, the optimization problem is often modified to a minimization of the negative log-likelihood function,โlog[L(โ )], of the form:
) 2, , (
Minimize
ฮพ ฯ
ฮฒ 2
2) log ( ) 1( )
log(
* )]
(
log[ ฯ
ยต
ฯ + + โยต โ
=
โ
โ L n R Y TRโ Y
(II.20)
Model verification is required prior to use in design applications. Model verification is often based on prediction testing. The values for the prediction points are calculated as the mean value of the distribution:
) ( ]
| ) (
[Y x Y =ยต+r Rโ1 Y โยต
E k T (II.21)
Where r represents a vector of correlations as represented by:
๐=๏ฟฝ๐(๐ฅ1๐,๐ฅ1๐)
โฎ ๐(๐ฅ๐๐,๐ฅ๐๐)
๏ฟฝ (II.22)