Input distribution characterization - Model validation challenge problems: structural dynamics

6.2 Model validation challenge problems: structural dynamics application

6.2.2 Input distribution characterization

Since the behavior of the subsystem is characterized using a linear model, the response is fully specified by the modal parameters: three natural frequencies, damping coefficients, and mode shapes, for a total of fifteen parameters. Thus, the natural way to characterize the variability as- sociated with the subsystems is to treat the modal parameters as random variables. Once a joint probability distribution for the modal parameters has been specified, it is possible to propagate the randomness through the given models using Monte Carlo simulation (this also applies to the system models, which contain as a component the three-degree-of-freedom subsystem: see Figure 6.9).

The probability distribution for the modal parameters can be estimated based on available data. In this case, the statistical data for the modal parameters are available for both the random vibration (calibration) and shock (validation) force inputs. For each experiment, 20 nominally identical components were tested at three different input levels: low, medium, and high. Thus, there are 60 data points each for the random vibration and shock inputs, for a total of 120 data points. However, the distributions of the random vibration and shock data differ significantly (perhaps due to the non-linearity in the components). For this reason, and since the intended use of the model pertains to shock excitation, the input distributions will be characterized based only on the shock data.

One of the most widely used and straightforward methods for characterizing randomness is through the use of the normal distribution. Unfortunately, however, the assumption of normality is violated for several of the parameters. For example, based on the well-known Shapiro-

Wilk test for normality (Shapiro and Wilk, 1965) with n = 60observations, univariate normality is rejected at the usual 0.05 significance level for the distributions of all three natural frequencies, along with the distributions of the first two damping coefficients.

In addition to the violations of normality, many of the modal parameters are highly correlated. For example, all three modal frequencies are nearly perfectly correlated with each other, and the frequencies show high negative correlations with the second and third elements of the first mode shape.

Thus, the distribution for the modal parameters is not only multivariate, but also highly correlated and non-normal, meaning that the use of the multivariate normal distribution is not appropriate. For the case of non-normal multivariate distributions, some specialized techniques are available to deal with fully-specified parametric joint distributions (see Section 4.2), but none are appropriate for the present case. This work makes use of the non-parametric density characterization approach known as kernel density estimation (KDE), which is described in Section 4.2.2.

Two challenges arise when attempting to use KDE. First, KDE does not tend to work well for high dimensions, because of the “curse of dimensionality’. That is, in high dimensions, large regions in the parameter space will have virtually no data in them. Second, there is no natural way of generating random samples from such a density representation, and random sampling will be needed in order to estimate the distribution of the system response.

The first challenge will be handled with the use of principal component analysis (PCA), which is outlined in Section 4.2.3. PCA is appropriate here because it allows a high-dimensional random vector to be re-expressed in terms of a lower-dimensional space. Finally, Markov Chain Monte Carlo sampling (Section 2.3) is adopted to overcome the second challenge, which

is to generate random samples from such an arbitrary distribution.

The first step is to apply PCA to find a reduced set of variables with which to represent the total variation, which involves an eigen-decomposition of the covariance matrix (the correlation matrix is used here because the variables are not measured in the same units). The analysis conducted here is based on the sixty realizations from the subsystem “validation”

experiments (which correspond to the shock excitation). The corresponding eigenvalues are plotted in Figure 6.10.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0 1 2 3 4 5 6 7 8

Principal component

Varianceexplained

Figure 6.10: Eigenvalues corresponding to the correlation matrix of the modal parameters.

Each eigenvalue represents the amount of variation explained by the corresponding principal component.

It is apparent from Fig. 6.10 that the first principal component explains approximately half of the total variation of all fifteen modal parameters (see Eq. (4.19)). By taking the first four principal components, approximately 96% of the total variation is explained. In order to balance efficiency and tractability against information loss, the first four principal components are retained to construct the probability density estimate.

The density estimate is now constructed using multivariate KDE based on the four (un-

correlated, but not necessarily independent) principal components of the modal parameters.

Specifically, the density estimate given by Eq. (4.14) is employed, where the bandwidth is estimated by minimizing the cross-validation score function given by Eq. (4.17), as discussed in Section 4.2.2. Finally, MCMC sampling, which is discussed in Section 2.3, can be used to generate random samples from the resulting density estimate. The resulting samples are of course samples of the principal components, but the original variables can be obtained via the simple reverse transformation of Eq. (4.21). A simple verification of this density estimation and sampling procedure is discussed next.

Verification of density characterization and sampling

The performance of the above procedure for maintaining the distribution structure of the original variables is considered here. The idea is to compare a set of randomly generated modal parameters to that of the original data. The difficulty is that there is no straightforward way of making such a comparison, and it can become cumbersome because of the high dimensionality of the data.

To simplify the comparison process, two comparisons are presented here, in which 1,000 randomly generated realizations are compared against the original 60 data points:

1. Correlations among the modal parameters: A15×15sample correlation matrix can be computed for both the original and simulated data. To provide a compact representation of the comparison, the elements of the two matrices are plotted against each other.

For perfect agreement, the points would fall on the liney =x.

2. Marginal distributions: The agreement of the marginal distributions can be assessed by comparing the empirical cumulative distribution functions (CDF’s). Further, to avoid

redundant comparisons, the marginal distributions of the first four principal components are compared as opposed to those of the 15 original variables.

The agreement of the pairwise correlations is plotted in Fig. 6.11. Most of the points fall near the line y = x, indicating that there is very good agreement between the observed and simulated correlation coefficients.

−1 −0.75 −0.5 −0.25 0 0.25 0.5 0.75 1

−1

−0.75

−0.5

−0.25 0 0.25 0.5 0.75 1

Correlations of original data

Correlationsofsimulateddata

Figure 6.11: Sample correlation coefficients of original modal parameter data versus those of simulated data. The solid line represents perfect agreement.

The empirical CDF’s for the first four principal components are compared in Figure 6.12, which indicates excellent agreement, particularly for the first three principal components.

There is some discrepancy for the fourth component, but this can be expected, considering the erratic nature of the empirical CDF for the original data of this component.

Finally, the agreement between the marginal distributions and the correlations is a good check, but it does not necessarily indicate that the full joint distributions are matching. The difficulty is that with non-normal data, a joint distribution is not fully specified simply by the marginals and pairwise correlations. The above results are presented as a sanity check, but it is acknowledged that they are not a complete verification of the proposed sampling methodology.

−80 −6 −4 −2 0 2 4 6 8 0.1

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Principal component 1

EmpiricalCDF

(a) First principal component

−60 −4 −2 0 2 4 6 8

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Principal component 2

EmpiricalCDF

(b) Second principal component

−40 −3 −2 −1 0 1 2 3 4

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Principal component 3

EmpiricalCDF

−20 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Principal component 4

EmpiricalCDF

(d) Fourth principal component

Figure 6.12: Comparison of observed empirical CDF’s (solid lines) and empirical CDF’s of simulated data (dashed lines).

Dalam dokumen uncertainty analysis for computer simulations - CORE (Halaman 153-159)