PDF Uncertainty Quantification and Integration in Engineering Systems

Sankaran Mahadevan, for his professional professional advice and guidance during my graduate studies at Vanderbilt University. Further, the computational resources at Vanderbilt University's ACCRE (Advanced Computing Center for Research and Education) were valuable for the simulations performed in this research.

Motivation

This dissertation focuses on improving the state of the art in uncertainty quantification methods in order to facilitate the quantification of limits and uncertainties and aid in risk-informed decision making in engineering systems. This dissertation uses a Bayesian approach and proposes new computational methods to quantify different types of uncertainty, integrate different sources of uncertainty in multiple models, and thereby provide information to facilitate risk-based decision-making during different stages of the life cycle of engineering systems.

Uncertainty Quantification

Physical Variability
Data Uncertainty
Model Uncertainty
Goals in Uncertainty Quantification

The most commonly considered type of data uncertainty is measurement error (both at the input and output levels). This is an example of epistemic uncertainty (that is, uncertainty that is reducible in light of new information); if sufficient data are available, the distribution parameters can be estimated accurately.

Uncertainty Integration

Hierarchical System Configurations

The development of the methodology for the integration of uncertainty quantification activities depends on the interaction between the different models in the system hierarchy. For example, a coupon, a beam, and a plate made of the same material form a non-consecutive hierarchy.

Goals in Uncertainty Integration

For example, the rise in the temperature of a wire due to heat conduction leads to a change in the resistance and therefore the current carrying capacity of the wire. In the literature, feedback coupling and feedback coupling are also referred to as strong and weak coupling respectively [2].

Research Objectives

Uncertainty Quantification

The quantification of uncertainty should also be accompanied by a sensitivity analysis of the various sources of uncertainty. The proposed methods for data uncertainty quantification and model uncertainty determination are applied to fatigue crack growth analysis under uncertainty as a case study.

Uncertainty Integration

Then, the effects of the various sources of uncertainty in uncertainty propagation are combined to quantify the uncertainty in the output. Therefore, a rigorous framework has been developed to study the sensitivity of the calibration parameter to the various sources of uncertainty and the data; this task is referred to as "inverse sensitivity analysis".

Highlights of the Dissertation: What’s New?

Consequently, a new decoupled approach for multidisciplinary analysis is developed to address this challenge. A new "inverse sensitivity analysis" methodology is developed to analyze the sensitivity of model parameters to the other sources of uncertainty and data (Chapter IX).

Organization of the Dissertation

Furthermore, the method of forward global sensitivity analysis is used to analyze the contributions of the different sources of uncertainty to the overall uncertainty in the model forecast. The uncertainty in the calibration parameters is affected by the presence of other sources of uncertainty in the system.

Overview

Finally, it is explained that the above methods for uncertainty propagation and statistical inference require several thousand computational model evaluations, and therefore, it may be necessary to construct surrogate models to replace them. Gaussian process surrogate model, which will be used to replace computationally expensive models in the rest of this dissertation.

Fundamentals of Probability Theory

In this case, the modern definition of probability is in terms of the cumulative distribution function (CDF), defined as FX(x) = P(X ≤ x), i.e. a PDF or CDF is said to be valid if and only if it satisfies all the above properties.

Interpretations of Probability

Physical Probability

In the context of physical probabilities, the mean of a random variable, sometimes referred to as the population mean, is deterministic. The interpretation of confidence intervals is sometimes confusing and misleading, and the uncertainty in the parameter estimate cannot be used for further uncertainty quantification.

Subjective Probability

On the other hand, the Bayesian methodology can calculate probability distributions for the distribution parameters, which can be easily used in the propagation of uncertainty. Therefore, the Bayesian methodology provides a framework in which epistemic uncertainty can also be addressed using probability theory, as opposed to the frequentist approach.

The Bayesian Methodology

Bayes Theorem
Bayesian Inference
Notes on the Likelihood Function
Bayesian Network

In general, these methods can be used when it is desired to generate samples from a PDF known only up to a proportionality constant. The Bayesian network can therefore be used for both forward problems (estimate the PDF of Z) and inverse problems (calibrate parameters X2 and X4 based on data).

Fig. 2.1 shows a conceptual Bayesian network that aids in uncertainty quantifi- quantifi-cation across multiple levels of models, and observed data

Methods for Uncertainty Propagation

Monte Carlo Sampling

In this method, several random realizations of X are generated based on CDF inversion, and the corresponding random realizations of Y are calculated. Error estimates for the CDF, in terms of the number of simulations, are available in the literature [27, 28].

Analytical Methods

The MPP and the shortest distance are estimated using the well-known Rackwitz-Fiessler algorithm [31], which is based on an iterative linear approximation of the nonlinear constraint G(x)−yc = 0. This difficulty is overcome by using an inverse FORM method [35] where multiple CDF values are chosen and the corresponding values of yc are calculated.

Global Sensitivity Analysis

Further, the difference between the index of total effects and the index of first-order effects provides an estimate of the contribution of variance due to the interaction between Xi and other variables. Thus, both first-order and total effect indices should be calculated to assess the sensitivity of the variables.

Markov Chain Monte Carlo Sampling

The Metropolis Algorithm

The following steps form the algorithm to generate samples of the underlying PDF. Repeat the following steps; each iteration returns an example from the underlying PDF. a) Select a prospective candidate from the proposal density q(x∗|xi).

Slice Sampling

After the Markov chain converges, the samples in X can be used to construct the PDF of X using kernel density estimation. A generalization of this algorithm assumes asymmetric proposal density functions q1(x∗|xi) and q2(xi|x∗); this algorithm is called the Metropolis-Hastings algorithm[40].

MCMC Sampling: Summary

Gaussian Process Surrogate Modeling

In contrast, in the case of the Gaussian process model, all training points and hyperparameters are both required to make predictions, even though the hyperparameters may have been estimated previously. An important issue in building the Gaussian process model is the selection of training points.

Summary

Alternatively, Hombal and Mahadevan [56] developed another training point selection algorithm where the emphasis is on selecting consecutive training points so that the bias error in the surrogate model is minimized. Once the training points are selected and the surrogate model is constructed, it can be used for (1) Monte Carlo simulation; (2) Markov chain Monte Carlo simulation; and (3) global sensitivity analysis.

Introduction

Challenges and Existing Approaches

Researchers have also studied the use of improbability approaches to treat epistemic uncertainty due to interval data. However, it is not easy to construct probability distributions in the presence of sparse point and/or interval data.

Proposed Approach

Given the choice of distribution type and distribution parameters, the uncertainty in the variable is due to the physical variability; for the sake of simplicity, in the rest of the chapter, it is simply referred to as variability. The uncertainty in the distribution type and the distribution parameters are both estimated (Section 3.6).

Case 1: Known PDF Type (Parametric)

Estimation of Distribution Parameters
Family of PDFs
Unconditional PDF : Predictive Posterior
Illustrative Example : Uncertainty Representation
Remarks

Therefore, the likelihood of the parameters P can be calculated as proportional to the PDF fX(x|P) estimated at the observed data point. A comparison of the family of distributions and the unconditional distribution is shown, via PDFs in Fig.

Figure 3.1: Family of Distributions Vs. Single Distribution: PDF

Variability versus Parameter Uncertainty

Need for Assessing Individual Contributions
The Auxiliary Variable Concept
P1 : Contributions in a Single Variable
P2 : Contributions to a Response Function
Illustration : Contributions in One Variable
Illustration : Contributions to a Response Function
Summary

Individual and overall effects of distribution parameter uncertainty for any X (considering the corresponding P alone). The individual and overall effects of variability and distribution parameter uncertainty are calculated from Eq.

Figure 3.5: Variability (U X ) and Distribution Parameter Uncertainty (P )

Case 2: Unknown PDF Type (Parametric)

Bayesian Model Averaging Approach

To illustrate, assume that the two competing model shapes are normal (N(µ, σ)) and uniform (U(a, b)). That is, a 'narrow' estimate of µ is sufficient to 'explain' the available data, while this requires a 'broad' estimate of b.

Bayesian Hypothesis Testing Approach

Similar to the Bayesian Model Averaging (BMA) procedure, the PDFs of the distribution parameters using the Bayesian Hypothesis Testing (BHT) approach are quantified and plotted. Earlier in the Bayesian model averaging approach (Section 3.6.1.3 ), the uncertainty of the distribution type was represented by a continuous random variable (w).

Figure 3.17: PDFs of Distribution Parameters

Uncertainty Propagation through a Model

As previously mentioned, the PDF family approach is computationally expensive for the purpose of uncertainty propagation. Similar to Bayesian model averaging, this unconditional, predictive PDF can be used to propagate uncertainty through the system model Y =g(X).

Case 3: Unknown PDF Type (Non-parametric)

The non-parametric approach is based on the fact that if the PDF values at some points are known, then the entire density function can be constructed based on an interpolation method. Since there are no distribution parameters or interpolation parameters, the method is referred to as non-parametric.

Figure 3.19: Non-parametric Probability Distributions

Sandia Challenge Problem

Problem 1

It may not make sense to represent λ and ξ using parametric distributions because this approach would lead to distributional parameters of the distributional parameters. This unconditional PDF for b is used to calculate the unconditional PDF for y, which is plotted in Figure 1.

Problem 2

Also consider the uniform distribution estimated for a; the estimates for the lower and upper bounds are so close (but with high standard deviations) that it is difficult to derive any utility from such results. Then the PDFs of the distribution parameters (µa and σa for normal, and La and Ua for uniform) can also be calculated.

Table 3.9: Bayesian Model Averaging: Results

Discussion of Results

Second, the proposed methodology is non-parametric, thus making the resulting PDF more loyal to the data than an assumed parametric PDF. Fourth, the proposed method provides the entire PDF of the output, which is useful in the context of reliability and risk assessment.

Summary

First, the individual contributions of two types of uncertainty in a single variable were considered. Then the method was extended to the assessment of individual contributions of the two types of uncertainty in multiple input variables to the uncertainty in the output of an underlying computational model.

Introduction

It is essential to quantify the uncertainty in the model in order to calculate the uncertainty in system-level performance. The mathematical equation developed in the first step contains several parameters θ (for example, the damping coefficient in a differential equation governing the plate deflection under dynamic loading).

Model Verification

Discretization Error

On the other hand, the method of Richardson extrapolation has been found to be closest to quantifying the actual discretization error. Since the discretization error is a deterministic quantity, it must be corrected for in the context of propagation of uncertainty.

Surrogate Model Uncertainty

It was emphasized that the choice of training points is important for the construction of the GP model. In Section 2.8, the training points were created based on the input-output data of the expensive computer model.

Model Calibration

The Basic Parameter Estimation Problem

A classic example in structural dynamics is the estimation of the damping coefficient based on the response measurement. Consider the computational model y=G(x;θ), where is the independent input variable and y is the dependent output variable.

Least Squares Estimation

The true values of the parameters (θ) are assumed to be deterministic, and the least squares estimate may not coincide with the true value. As the number of parameters increases, it is computationally demanding to quantify the uncertainty associated with least squares estimation.

Figure 4.2: Confidence Bounds in Least Squares Analysis

The Likelihood Method

If there are two parameters, then the uncertainty is represented by a two-dimensional confidence interval. Therefore, parameter uncertainty cannot be propagated through another computational model.

Bayesian Inference

As explained earlier in Section 2.7, MCMC algorithms such as Metropolis sampling [38], Metropolis-Hastings sampling [40], Gibbs sampling [41] and slice sampling [39] are commonly used to generate samples from the common posterior, seen clearly by evaluating the normalization constant. Such a method is described in this chapter in Section 4.3.7; another method is described later in Chapter IX.

Kennedy O’Hagan Framework

When Bayesian inference is performed, the joint likelihood is constructed for (1) the model parameters θ; (2) hyper-parameters for model inadequacy functionδ(x); and (3) standard deviation (σm) of the measurement error (ǫm). However, McFarland [52] reports that the uncertainty due to the hyperparameters of this Gaussian process (the GP that replaces G(x;θ)) is negligible compared to the uncertainty in the model parameters, and therefore it may be easier to estimate hyper -the parameters of this GP before Bayesian inference.

Regularization

Adaptive Integration for Bayesian Inference

However, by the intrinsic definition of probability, this can be converted into two nested one-dimensional integrals. These one-dimensional integrals can be evaluated using advanced numerical algorithms such as Adaptive Recursive Simpsons Quadrature [132].

Model Calibration under Uncertainty

Additionally, each of the measurements (input and/or output) can be point-valued or an interval. This PDF can be used in uncertainty propagation to calculate the model prediction as a function of the parameter θ using uncertainty propagation methods discussed in Section 2.5.

Estimating θ versus Distribution Parameters of θ

Application: Energy Dissipation in a Lap Joint

In the second step, the PDF of the model prediction is estimated as a function of Fj (j = 1 to 5) enkn. In the third step, the likelihood function is calculated using the methods developed in Section 4.3.8.

Figure 4.3: Composite PDF of Linear Stiffness

Model Calibration: Summary

Model Validation

Bayesian Hypothesis Testing

In the context of model validation, P(H0|D) is the posterior probability that the model is correct. Then the Bayes factor can be used to calculate the posterior probability that the model is correct as in Eq.

Reliability-based Metric

4.49, the concept is to expand the PDF of fθ(θ) through the model and then use the full PDF of the model prediction in Eq. A simple Monte Carlo analysis can be used for this purpose, since this calculation does not require any further evaluations of the G model.

Application: Energy Dissipation in a Lap Joint

It should be noted that the value of the Bayes factor and the Bayesian posterior probability depend on the choice of the alternative hypothesis fy(y|H1). Similarly, the model reliability metric is calculated for each ordered input-output pair, by considering a tolerance level equal to 5% of the observed experimental value.

Table 4.3: Correlations Between Model Parameters m log 10 (k n ) k

Application: Heat Conduction

Note that this PDF is only used in the model reliability approach and not in the Bayesian hypothesis testing approach. Although both the methods - Bayesian hypothesis testing and model reliability approach - are used to calculate "the probability that the model is correct", it is.

Model Validation: Summary

In Bayesian hypothesis testing, the choice of fy(y|H1) is subjective and affects the outcome of model validation; in fact, the Bayesian posterior probability is simply a relative measure of the support from the data to the model versus the model. On the other hand, there are no assumptions in the model reliability metric, and therefore it is likely to provide a more objective metric for model validation.

Summary

Therefore, to quantify the uncertainty in the system-level prediction, it is not only necessary to quantify the uncertainty of the input data, but it is also essential to quantify the uncertainty of the model. Model Calibration: Physics-based models have parameters that need to be calibrated/estimated so that model predictions better match experimental observations.

Introduction

The advanced problem involves using a calibrated model for probabilistic crack growth prediction. Various types of model uncertainty and error, crack growth model uncertainty, surrogate model uncertainty, and finite element discretization error are explicitly discussed.

Crack Growth Modeling

In each cycle, the stress intensity factor can be expressed as a function of the crack size (a), load (L) and. This equivalent stress intensity factor is then used in cycle-by-cycle integration of .

Figure 5.1: Deterministic Crack Propagation Analysis

Sources of Uncertainty

Physical or Natural Variability

This case study considers physical variability in (i) loading conditions and (ii) material properties (threshold stress intensity factor ∆K and fatigue limit ∆σf). As mentioned earlier, variability in other material properties such as modulus of elasticity, Poisson's ratio, etc.

Data Uncertainty

5.8, is the measured error magnitude; A is the actual error size; β0 and β1 are the regression coefficients; ǫm represents the objective measurement error, assumed as a normal random variable with zero mean and standard deviation σǫ.

Model Uncertainty and Errors