Objective formulation - Test selection optimization methodology

5.3 Test selection optimization methodology

5.3.1 Objective formulation

Once the two objectives are both viewed in terms of uncertainty reduction, it is more natural to combine them into a single objective function that can be minimized over the feasible set of the number of tests of each type. The set of available testing options typically includes many different possible input conditions for both calibration and validation. The methodology described in Section 5.2 is used to motivate the test selection activities so that the value of each available option can be quantified. By sampling over the uncertainty in the overall model reliability metric, a different predictive parameter distribution is obtained for each sample. Then, each distribution is propagated through the model along with the aleatory uncertainty in the prediction inputs to obtain a stochastic prediction .

(5.5)

The set of distributions for collectively represent a family of predictions. An example of this family of distributions in CDF form is given in Figure 5.3. The overall goal is to minimize prediction uncertainty within budget constraints. To make decisions, this goal must be written in the form of an objective function; therefore, variance is used to quantify the prediction uncertainty. The variance of a family of predictions can be expressed using the law of total variance [109], which states

(5.6) for two general random variables and . In the context of the prediction problem, the variance of is of interest, and each prediction is conditioned on a particular sample of the overall model reliability. Therefore, Eq. (5.6) can be applied to express the prediction variance in terms of the validation result.

(5.7)

The importance of this variance decomposition is that these two terms correspond to the effects of calibration and validation respectively. In particular, the goal of calibration, expressed by the , is to minimize the uncertainty in a single prediction by reducing parameter uncertainty in the posterior distribution that contributes to the predictive parameter distribution. On the other hand, the goal of validation, expressed by the , is to reduce the uncertainty about the prediction by driving a family of uncertain predictions toward a single prediction that is not biased by measurement errors. The total variance should be minimized over the set of decision variables (numbers of each type of test).

Figure 5.3: Family of CDF predictions

To make the assessment, some synthetic data must be generated to represent expected outcomes of the experiment. In the absence of any prior knowledge about the experiment, the only way that these expected outcomes can be produced is by evaluating the model at the input conditions of the experiment and then adding measurement noise to the data. The distribution of the noise is obtained from the best available information about the instrumentation accuracy. If some historical data is available on closely related experiments, a data-driven model can be created independently of the physics-based model to more accurately estimate the potential outcomes. A data-driven model could also be generated once some tests have been conducted and then improved adaptively. With any of these approaches, the expected outcomes are stochastic, so even at a fixed input condition, many realizations of experimental data can be generated from the model due to the presence of the estimated measurement noise. Therefore, the proposed formulation of the objective function takes an expectation over many realizations of synthetic experimental data.

The decision variables in the optimization problem are the numbers of tests of each type to conduct. In this chapter, a finite set of testing options is considered. Thus, the decision variables are denoted as vectors (length for different calibration input conditions) and

(length for different validation input conditions). Once the decision variables are selected, an arbitrary number of data realizations (limited by computational expense) are generated with observation vector lengths equal to the values of the decision variables. Then, for each realization of the data vector, , the entire integration procedure described in Section 5.2 is performed, resulting in a family of predictive parameter distributions each denoted as in Eq. (5.5). Within this context, the following formulation for the optimization problem is proposed.

(5.8)

The constraint function for the decision variables is given by Eq. (5.8) where the total cost is constrained by a total testing budget . The row vectors and of lengths and respectively contain the costs of the calibration and validation tests at each available input condition, and the superscript denotes a vector transpose. The formulation of Eq. (5.8) can be further decomposed by applying Eq. (5.7) and taking advantage of the linearity of the expected value operator.

{ } { } (5.9)

The first term is improved by collecting calibration data since narrowing the posterior distribution of will also tend to reduce the average prediction variance. The second term is improved by collecting validation data since this data will converge the distribution of toward a deterministic value at the limit (i.e. infinite data). A deterministic value of

100

implies that the is zero for any . These two terms in Eq. (5.9) can be expanded to

{ } ∬ (5.10)

{ } [ ] (5.11)

The goal of the optimization problem is to minimize the sum of the two integrals given in Eq.

(5.10) and (5.11). Note that weights could be applied to these two terms if there were reason to preference one over the other. However, the weighted sum would not reflect the overall prediction uncertainty precisely.

Each of these integrals is evaluated by Monte Carlo sampling since the density function for

is not known analytically, and the data model may not have an analytical form either. Since they are evaluated by sampling, the objective function value is inherently stochastic.

In addition, the decision variables are discrete quantities, and a relaxation to the continuous space is not possible since fractional tests are meaningless. These two factors (stochasticity and discreteness) significantly limit the available options for solving the optimization problem, which leads to the solution strategy that follows.

Dalam dokumen PDF DOCTOR OF PHILOSOPHY - Vanderbilt University (Halaman 106-110)