• Tidak ada hasil yang ditemukan

Nonlinear regression

Regression analysis is the study of the relationship between a dependent response variable and a set of independent, explanatory, variables. Linear regression analysis is restricted to the case in which the dependent variable is expressed as a linear function of the unknown parameters. Nonlinear regression analysis is the extension for which the relationship between the dependent variable and the unknowns may have any functional form. As such, nonlinear

regression analysis provides one way of thinking about the calibration of computer simulations, because it allows the simulation to be viewed as a general, black-box function.

When nonlinear regression analysis is applied to the calibration of computer simulations, the dependent variable is the simulator output, the independent variables are typically observ- able experimental conditions, and the unknowns are those internal simulation parameters that are to be estimated. Thus, the relationship between the dependent and independent variables is expressed as

y=G(θ,s), (5.1)

wherey is the dependent variable,θis a p-dimensional vector of calibration parameters,sis the vector of dependent variables, andG(·,·)represents the computer simulation.

Consider that the calibration parameters are to be estimated usingn experimental obser- vations y = (y1, . . . , yn)T of the dependent variable(s) that correspond to the values of the independent variabless1, . . . ,sn. The nonlinear regression model, which relates the predicted and observed values, will be written as

yi =G(θ,si) +εi. (5.2)

In the simplest case, the random errors, εi, are taken to be independently and identically distributed as εi ∼ N(0, σ2). This formulation may be appropriate if the assumption of inde- pendence is reasonable and all of the yi have the same units. However, when some of theyi represent different quantities (possibly different features of the same response), then the i.i.d.

model is no longer appropriate. To take account of observations with different units, as well as dependencies among the observations, ann×nweighting matrix ωcan be incorporated into

the model, so that theεi have a joint distributionε∼N(0, σ2ω−1). In the case of independent observations, ω is diagonal, and the diagonal elements represent the weights given to each observation.

When the experimental data consist of repeated observations of multiple different response features, then the observed variance of each feature can be used to estimate an appropriate weight for that feature. In this case, the value of all diagonal elements ofωcorresponding to a particular feature could be set to the inverse of the sample variance observed for that feature.

The weighted least-squares estimator, θ, is that which minimizes the weighted sum ofˆ squared errors function1:

S(θ) = [y−G(θ)]T ω[y−G(θ)], (5.3)

whereG(θ) = G(θ,s1), . . . , G(θ,sn)T

. It is easy to see that when the weighting matrix is the identity matrix (equivalently, when the errors are independently and identically distributed), thenS(θ)can be written as

S(θ) =

n

X

i=1

[yi −G(θ,si)]2. (5.4)

Unlike the case in which the relationship between the outputs and the unknowns is linear, there is no general analytical solution for the value of θ that minimizes the sum of squared errors function for nonlinear relationships. Nonlinear regression analysis thus typically relies on numerical optimization procedures for finding the minimum. In fact, there are a variety of specialized techniques for solving nonlinear least-squares problems (Levenberg-Marquardt

1It is shown in Section 5.5 that the least-squares estimator is in this case also the maximum likelihood estima- tor.

methods are particularly widespread; c.f. Seber and Wild, 2003).

In regression analysis, the amount of uncertainty associated with a particular estimate of the unknown parameters is expressed through confidence intervals. For example, a 95% confi- dence interval for a parameterθ0captures the notion that one has a certain amount of confidence that the true, unknown, value ofθ0 lies within that interval. As the confidence level increases, the size of the interval must increase as well. From a frequentist standpoint, the confidence level has a specific interpretation as a probability: in the long run, 95% of all confidence inter- vals constructed at the 95% confidence level will contain the true value of the parameter.

When one wants to make inference about multiple parameters, multi-dimensional confi- dence regions come in to play, as opposed to simple intervals. In nonlinear regression analysis, there are a couple of common approaches for constructing approximate confidence regions.

One of the simplest is the “linear approximation confidence region.” First, define the n×p matrix of derivatives of the model outputs with respect to the calibration inputs as

V =

∂G(θ,si)

∂θj

i,j

. (5.5)

SinceV will most likely be a function ofθ, letVˆ denoteV(ˆθ). Then for largen, the100(1− α)% linear approximation confidence region forθconsists of all values ofθ which satisfy the inequality

θ−ˆθT

TωVˆ

θ−θˆ

≤ps2Fα,p,n−p, (5.6)

whereFα,p,n−pis the upperαprobability point of theF-distribution with(p, n−p)degrees of freedom, ands2 is the sample estimate ofσ2, given bys2 =S(ˆθ)/(n−p). Notice that (5.6) is a quadratic form inθ, and the resulting confidence region will be an ellipsoid.

Confidence regions constructed based on the linear approximation theory can be particu- larly inaccurate for small sample sizes and strongly nonlinear models. An alternative approach is to consider contours of S(θ). SinceS(θ)measures the “closeness” of the data to the pre- dictions, it seems intuitive to base the confidence region forθon the contours of this function.

Such a region might have the form (Seber and Wild, 2003)

S(θ)≤cS(ˆθ), (5.7)

for some constant c > 1. Regions of this form are often called “exact” confidence regions because they are not based on any approximations. The difficulty, however, is that the particular coverage probabilities corresponding to the various contour levels are not generally known. A common approach is to employ asymptotic theory, which can be used to show that for large enoughn, the following confidence region based on the contours of the sum of squares function holds (Seber and Wild, 2003):

S(θ)−S(ˆθ)≤ps2Fα,p,n−p. (5.8)

Following Seber and Wild (2003), confidence regions constructed using the above inequality will be referred to as “exact” regions because they are based on contours of the likelihood func- tion (see Section 5.5), even though the particular confidence levels are based on an asymptotic approximation.

The primary disadvantage of confidence regions of the form (5.8) is the computational expense. Each evaluation of the inequality requires computing S(θ)for a particularθ, which in turn may require up tonevaluations of the computer simulationG(·,·). As such, it may be

very expensive to find or plot a particular confidence region. Donaldson and Schnabel (1987) present a comparison of several approaches to constructing confidence regions that includes the linear approximation and “exact” methods, as well as three different derivative estimation schemes. Using a Monte Carlo coverage probability study, they found the “exact” method to be very reliable, whereas the linear approximation region may be quite inaccurate when the sample size is small orG(·,·)is highly nonlinear.

One of the limitations of the above confidence regions is that they do not lend themselves to graphical visualization when the dimension, p, is greater than two or three. A couple of options are available for visualization when p > 2. The first option amounts to plotting two- dimensional “slices” of the full confidence region for specific values of the “nuisance” variables (those parameters not represented in the two-dimensional plot). Rawlings et al. (1998) provide several examples of this approach for linear regression. While such an approach is simple, the two-dimensional slices are only applicable to particular values of the nuisance variables, and are particularly difficult to interpret forn≥4.

An alternative is to construct a two-dimensional joint confidence region for two parameters of interest, which ignores the other (p−2) parameters. This concept is analogous to well- known methods for constructing univariate confidence intervals in linear regression. It is also similar to the idea of marginal distributions, which comes into play in Bayesian inference.

Consider thatθis partitioned asθ= (θT1T2)T, whereθ2contains thep2parameters of interest (for a two-dimensional confidence region, p2 = 2). The linear approximation confidence region of (5.6) becomes (Seber and Wild, 2003):

θ2−θˆ2T

2−1

θ2−θˆ2

≤p2Fα,p2,n−p, (5.9)

whereCˆ2 is thep2×p2 matrix containing the corresponding elements of the complete covari- ance matrix estimated bys2(VTV)−1.

Seber and Wild (2003) also discuss an adaptation of the exact regions that can be used for plotting confidence regions for two-dimensional parameter subsets. Let θ2 denote a p2- dimensional parameter subset of interest, and let θ1 denote the remaining (nuisance) parame- ters. Eq. (5.8) is then adapted as

Sh

θˆ12),θ2i

−S(ˆθ)≤p2s2Fα,p2,n−p, (5.10)

whereθˆ12)contains those values ofθ1that minimizeS(θ12)for a givenθ2. Thus,θˆ12) must be computed for each value ofθ2.

Each evaluation of inequality (5.10) involves a least squares optimization problem over the (p − 2) parameters in θ1. This becomes extremely expensive, because to plot a two- dimensional confidence region may require hundreds of evaluations of the inequality itself, for various values ofθ2.