4.2 Uncertainty propagation
4.2.2 Kernel density estimation
techniques are available to sample from fully-specified parametric joint distributions (Johnson, 1981), but they will not be applicable to most practical problems. Other sampling techniques that have been developed include approximations based on partially specified joint distribu- tions (Lurie and Goldberg, 1998; Iman and Conover, 1982) and the use of bivariate copulas (Haas, 1999).
As an alternative to parametric techniques, which are only applicable when the data con- form to the specified probability model, several non-parametric techniques are available that are more generally applicable. Two such non-parametric techniques are the polynomial chaos expansion (c.f. Ghanem and Spanos, 1991; Ghanem and Dham, 1998; Debusschere et al., 2004;
Ghanem et al., 2008; Ghanem, 1999) and kernel density estimation. Polynomial chaos expan- sion is a method whereby a random variable is expanded as a set of orthogonal basis functions on independent, standardized random variables. Kernel density estimation is discussed below in Section 4.2.2.
When dealing with a large number of dependent random variables, uncertainty propagation can often be simplified by finding a more compact representation of the high-dimensional random vector. If the original high-dimensional set can be well-approximated by a lower- dimensional set, then the problem of density estimation need only be concerned with the lower- dimensional set. In probability and statistics, the canonical approach to finding a compact representation of a high-dimensional random vector is what is known as principal component analysis; this approach is outlined below in Section 4.2.3.
the estimate is formed by adding up a set of “kernel densities” centered at each of the observa- tions. The kernel densities function to smooth out the density so that the overall trends of the underlying density can be captured.
Formally, the kernel density estimatef(x)ˆ is expressed as
f(x) =ˆ 1 nh
n
X
i=1
K
x−Xi h
, (4.10)
where n is the number of observations, Xi represents the ith observation,K(·) is the kernel function, and h is a smoothing parameter known as the window width, or bandwidth. The value of h determines the width of the kernel estimate that is placed on each observation.
Large values ofhwill smooth out the features of the density estimate, while small values ofh will capture the fine structure of the observations.
Before constructing a kernel density estimate, the analyst must choose both a form for the kernel function, K(·), and a value for the window width. The estimator tends not to be very sensitive to the choice of the kernel function (Simonoff, 1996), so the Gaussian kernel is commonly chosen for ease of computation and smoothness. In this case, K(·)is simply the standard normal PDF, andhis sometimes referred to as the standard deviation of the kernel.
The density estimate is known to be much more sensitive to the choice of the window width. Ideally, one would like to estimateh by minimizing an error measure against the true underlying density, such as the mean integrated squared error (MISE), but doing so would require knowledge of the true density. Some rules of thumb are available that provide quick estimates tohbased on the observed data. For example, the “Gaussian reference rule,” which
is based on a Gaussian assumption for the underlying density, is given by the simple formula
h = 1.059σn−1/5. (4.11)
A more rigorous approach is to estimate h by finding the value that minimizes the cross- validation score function (Silverman, 1986)
M1(h) = 1 n2h
n
X
i=1 n
X
j=1
K∗
Xi−Xj h
+ 2
nhK(0), (4.12)
where the functionK∗(·)is defined as
K∗(t) =K(2)(t)−2K(t), (4.13)
and K(2)(t) is the convolution of K(·) with itself, which in the case of a standard Gaussian kernel is a Gaussian density with variance 2.
Now consider the multivariate case, in which one wishes to construct a joint density esti- mate based on ann×drandom sampleX. Several extensions of the density estimate given by Eq. (4.10) are available for multivariate estimation. One of the most straightforward is given by
f(x) =ˆ 1 nhd
n
X
i=1
K
x−Xi h
, (4.14)
in which the window width is the same in each dimension. Note thatK(x)is now a multivariate kernel, and one common choice is the standard multivariate normal density (refer to Silverman, 1986; Simonoff, 1996, for more possibilities).
Because the density estimate of Eq. (4.14) uses the same window width for each dimension,
it is usually necessary to first standardize the data in some manner so that the units of measure do not have an adverse effect. One possibility is to first “pre-whiten” or “sphere” the data to have unit covariance. This can be accomplished using the linear transformation given by
x0 =S−1/2x, (4.15)
where S is the sample covariance matrix of the observations. The resulting kernel density estimate of the sphered data is given by
f(x) =ˆ |S|−1/2 nhd
n
X
i=1
K S−1/2(x−Xi) h
!
. (4.16)
In analogy to Eq. (4.12),hcan be chosen to minimize the least-squares cross-validation score
M1(h) = 1 n2hd
n
X
i=1 n
X
j=1
K∗
X0i−X0j h
+ 2
nhdK(0), (4.17)
where K∗(t) = K(2)(t)−2K(t); and the convolution of the standard multivariate normal kernel with itself,K(2)(t), is the multivariate normal kernel with covariance2I.
One important note regarding multivariate kernel density estimation is that the estimators suffer from the “curse of dimensionality,” meaning that as the dimension increases, progres- sively larger sample sizes will be needed to achieve comparable accuracy. This is because in high dimensions there will almost surely be large regions of the parameter space that do not have any data in them. Simonoff (1996) suggests that “the ‘empty space phenomenon’
in higher dimensions . . . argues against very effective direct density estimation in more than four or five dimensions.” One possible solution to this problem is to construct the density es- timate based on a lower-dimensional representation of the data; principal component analysis
provides one framework for obtaining such a representation, and it is discussed next.