3.4 Case 1: Known PDF Type (Parametric)
3.4.1 Estimation of Distribution Parameters
This section considers the case where the distribution type of a particular quantity X is known. Let the corresponding PDF be denoted by fX(x|P), where P denotes the distribution parameters which need to be estimated based on the available data.
First, the concept of likelihood is reviewed for point data and then extended to the case of interval data.
Assume that xi (i = 1 to m) point valued data are available for the estimation of P. The likelihood function of the parameters, denoted by L(P) is defined as being proportional to the probability of observing the given data (xi; i = 1 to m)
conditioned on the parameters P [19, 22]. Note that, by definition, the definition of likelihood is meaningful only up to a proportionality constant [19].
GivenP,Xfollows a continuous PDFfX(x|P). For a continuous density function, the probability value for any single discrete point xi is theoretically zero. Hence, Pawitan [22] states, “A slight technical issue arises when dealing with continuous outcomes, since theoretically the probability of any point value xi is zero. We can resolve this problem by admitting that in real life, there is only a finite precision:
observing xi is short for observing X ∈(xi−ǫ2, xi+ǫ2) whereǫis the precision limit.”
If ǫis small enough, on observing xi, the likelihood forP is:
L(P) ∝P(X∈(xi− 2ǫ, xi+2ǫ)|P)
=Rxi+2ǫ
xi−ǫ2 fX(x|P)dx
=ǫfX(xi|P) (by mean value theorem)
∝fX(xi|P)
(3.1)
Hence, the likelihood of the parameters P can be calculated as being proportional to the PDF fX(x|P) evaluated at the observed data point. Note that this density function is actually conditioned on the parameters P. If there are several data points (xi; i= 1 tom) that are independent of each other, then the combined likelihood of the parameters P can be calculated as:
L(P)∝
m
Y
i=1
fX(xi|P) (3.2)
(The word “independent” above implies that the sources of data, i.e. different ex- periments, or different experts from which the data originate, are considered to be statistically independent. In other words, the outcome of one experiment (or the
information from one expert) does not affect the outcome of another experiment (or the information from another expert)).
The parameters P can be inferred by maximizing the expression in Eq. 3.2. This estimate is popularly known as the maximum likelihood estimate.
Note that the above derivation of likelihood considers an infinitesimally small interval around the data pointxi. Hence, it is straightforward to extend this definition to any general interval [a, b]. Hence, the expression for likelihood of the parameters P, for a single interval [a, b] is:
L(P) ∝P( data |P)
=P(X ∈[a, b]|P)
=P(a≤X ≤b|P)
(3.3)
The expression in Eq. 3.3 is evaluated from the cumulative distribution function (CDF) FX(x|P), i.e. the integral of the PDF. Therefore,
L(P)∝ Z b
a
fX(x|P)dx=FX(b|P)−FX(a|P) (3.4)
The expression on the right hand side of Eq. 3.4 represents the area under the PDF, and hence its value is always less than or equal to unity.
In principle, the PDF fX(x|P) can assume any arbitrary shape, the only restric- tions being that it should be positive and the area under the curve must be equal to unity. Consider the optimization problem of maximizing the expression on the right hand side of Eq. 3.4. We know that the maximum value of this expression will be equal to unity. This expression will be equal to unity for any arbitrary density func- tion fX(x) whose support is bounded in [a, b] (i.e. fX(x) = 0 when x < a and x > b, and fX(x) takes arbitrary positive values in a≤x≤bso that the area the under the
curve is equal to unity) and hence, this optimization problem (i.e. inferring the dis- tribution parametersP for a single interval) will have a non-unique solution. Hence, the likelihood-based approach cannot be used to represent a single interval using a probability distribution.
If there are several intervals (given by [ai, bi], i = 1 to n) for the description of X and these intervals are assumed to be independent (similar to calculation of likelihood for point data), then the combined likelihood of the parameters P of the PDF fX(x|P) can be calculated similar to Eq. 3.2 as:
L(P)∝
n
Y
i=1
Z bi
ai
fX(x|P)dx (3.5)
If the available data is a combination of both point values and intervals, the likelihood function of the parameters P can be calculated as follows. Suppose there is a combination of point data (m data points, xi, i = 1 to m) and interval data (n intervals, [ai, bi], i = 1 to n) for the description of X. The right hand sides of Eqs. 3.2 and 3.5 are quantities that are proportional to probabilities that can in turn be multiplied to calculate the combined likelihood. The multiplication is justified by the assumption that the sources of these data are independent, as:
L(P)∝[
m
Y
i=1
fX(x=xi|P)][
n
Y
j=1
Z bj
aj
fX(x|P)dx] (3.6)
The maximum likelihood estimate of the parametersP can be calculated by maxi- mizing the expression in Eq. 3.6 when both point data and interval data are available.
Instead of maximizing likelihood, this dissertation will use afull likelihood estimate in which the entire likelihood function is used to construct the PDF of the distribution parametersP as in Eq. 3.7. The idea that the entire likelihood function, rather than
merely its maximizer, should be used for inference was emphasized by Barnard et al. [78].
Let f(P) denote the joint PDF of the distribution parameters P. Apply Bayes theorem and choose uniform prior density (f′(P) =h), and calculate the joint PDF as:
f(P) = hL(P)
R hL(P)dP = L(P)
R L(P)dP (3.7)
Note: The uniform prior density function can be defined over the entire admissible range of the parameters P. For example, the mean of a normal distribution can vary in (−∞, ∞ ) while the standard deviation can vary in (0, ∞) because the standard deviation is always greater than zero. Both these prior distributions are improper prior distributions because they do not have finite bounds.
The construction of the likelihood function in the case of interval data has been considered in the literature (for e.g. [79]). The contribution of this chapter is to use this idea to construct PDFs for such interval data, and hence use such PDFs for uncertainty propagation.