• Tidak ada hasil yang ditemukan

4.1 Model validation

4.1.1 Background

will emphasize the appropriate use of significance testing for model validation assessment.

The second section of this chapter, Section 4.2, discusses uncertainty propagation. Uncer- tainty propagation is included here because it is often a pre-requisite to validation assessment:

when model inputs are uncertain or random variables, model predictions should be compared to observations in light of the model output uncertainty that is implied by the input uncertainty.

Section 4.2 discusses appropriate approaches for estimating the probability distribution of the computer simulation output that is implied by the probability distributions associated with the simulation inputs. In particular, the non-parametric probability density estimation tool known as kernel density estimation is discussed in Section 4.2.2, and one approach to dimensionality reduction, principal component analysis, is presented in Section 4.2.3.

AIAA (1998); ASME (2006); ANS (1987); ISO (1991). One of the first challenges in V&V was to pinpoint precisely what is meant by verification and validation. Although several for- malisms have since been put forth, some of the earliest and most widely regarded definitions are those originally published by Schlesinger (1979) in connection with the Society for Com- puter Simulation. For example, model validation is defined as

Substantiation that a computerized model within its domain of applicability pos- sesses a satisfactory range of accuracy consistent with the intended application of the model.

Several aspects of this definition are important. First, validation is concerned only with whether or not the model has a satisfactory range of accuracy with regards to an intended application. Thus, the process of validation only has meaning once the intended application is specified. This entails the characterization both of a domain of applicability for the model, as well as its intended use. For instance, if the intended use of a model is solely to predict acceleration response, then validation is not concerned with the accuracy of stress predictions.

Further, the phrase “domain of applicability” in the definition reminds that validation is not concerned with the predictive capability of the model over all possible boundary, loading, initial conditions, etc., but that the predictive capability should only be assessed for a particular domain of interest.

Most of the previous work in the V&V field has dealt with outlining general frameworks and methodologies. Consequently, validation can be divided into several sub-fields, such as the design of the validation experiments, the design of the computer experiments, uncertainty quantification, and validation (or comparison) metrics. An in-depth V&V overview is given by Oberkampf and Trucano (2002), which discusses, in addition to code verification, all of

the validation steps mentioned above. Other, more conceptual, reviews are given by Sargent (2004) and Balci (1997).

Work dealing with the development and application of generally applicable quantitative validation metrics has been limited. In fact there has been significant evolution over time in terms of what are viewed as important features of a validation metric. For example, Oberkampf and Trucano (2002) state that “a useful validation metric should only measure the agreement between the computational results and the experimental data.” The underlying philosophy that gave rise to this viewpoint was the emphasis that the accuracy andadequacy of a particular model are strictly separate issues. That is, accuracy is a measure of agreement between predic- tions and observations. The purpose of the accuracy metric is to support a decision based on the more importantadequacyconsideration, which reflects whether or not the model is suitable (i.e., adequate) for its intended use. Note that in later work, Oberkampf and Barone (2006) de- scribe several desired features of a validation metric, arguing that such a metric should, among other things, depend on the number of experimental replications of a measurement quantity, in order to reflect a “level of confidence”. Such a metric clearly does not measure just the accu- racy of the model, as suggested in the previous work (Oberkampf and Trucano, 2002). This is merely one example of how the philosophy of model validation has evolved over time.

Some have studied model validation from the perspective of statistical hypothesis testing.

Although it is not necessarily a popular approach within the validation community, hypothesis testing nevertheless provides a well-established foundation for quantifying considerations such as sample size, inherent variability, and type I/II errors. Recent work dealing with the use of hypothesis testing for validation assessment includes Paez and Urbina (2002); Hills and Leslie (2003); Dowding et al. (2004); Chen et al. (2004). The Bayesian perspective on hypothesis

testing has also been explored by Mahadevan and Rebba (2005); Rebba et al. (2006); Rebba and Mahadevan (2007). Jiang and Mahadevan (2007) even discuss a method for incorporating Bayesian hypothesis testing with risk considerations for the purpose of decision making.

A good overview of quantitative metrics is given by Rebba (2005), which discusses clas- sical and Bayesian hypothesis testing, as well as other alternatives such as decision-theoretic approaches and model reliability.

An additional challenge when developing validation metrics arises when the response quan- tity of interest from the simulation is multivariate. This can occur if multiple different re- sponses are of interest, such as stress and temperature, or if one response varies over time or space (although in such cases, the analyst should carefully consider whether or not a scalar

“summary statistic” might capture all of the relevant information contained in a high-dimensional response).

It is well understood that statistical metrics that compare multiple responses simultane- ously must be carefully developed so that dependencies among the various response measures are accounted for. It appears that Balci and Sargent (1982) were the first to apply multivariate statistical methods for model validation. Rebba and Mahadevan (2006) provide a detailed dis- cussion of multivariate methods, including classical and Bayesian hypothesis testing for both distance and covariance similarity, as well as computational issues such as data transforma- tions.