Exchangeability and Indifference
3.2 DE FINETTI-STYLE THEOREMS FOR INFINITE SEQUENCES OF NON-BINARY RANDOM QUANTITIES
sameas the chance of observing a 1), Bernoulli proved that the relative frequency with which a 1 occurs approximates. Stated formally, ifSmis the total number of 1s that occur in a large number, saym, of Bernoulli trials with a parameter, then according to theweak law of large numbers(for Bernoulli trials),
P Sm
m − <
→1 for >0 arbitrarily small but fixed.
Recall that, even to Bernoulli, a relative frequency was not a probability. Thus bothand the probability statement of the theorem were regarded by Bernoulli as ‘ease of happening’, not relative frequencies.The strong law of large numbers(first stated by Cantelli in 1917) asserts that, with probability 1,Sm/m−becomes small and remains small.
Relative Frequency of Exchangeable Bernoulli Sequences
After proving his theorem for exchangeable sequences, de Finetti went on to prove a second result which is of great importance. In what follows, I shall adhere to the notation used in the infinite version of de Finetti’s theorem. He proved that, for exchangeable sequences, the (personal) probability that the limm→k/m exists is 1 and, if this limit were to be denoted by , then the (personal) probability distribution for is F. If F is such that all its mass is concentrated on a single value, then the weak law of large numbers follows as a special case of de Finetti’s theorem.
It is important to note that again is not to be interpreted as a probability, because to a subjectivist a probability is the amount you are willing to bet on the occurrence of an uncertain outcome. Lindley and Phillips (1976) viewas a property of the real world and refer to it as a propensityor achance, and the quantityk1−n−kas achance distribution. Some authors on subjective probability, such as Kyburg and Smokler (1964, pp. 13–14), have suggested that the above result of de Finetti bridges the gap between personal probability and physical probability.
However, de Finetti (1976) disagrees, and Diaconis and Freedman (1979) also have trouble with this type of synthesis. Hill (1993) attempts to clarify this misunderstanding by saying that even though there does not pre-exist a ‘true probability’, one could implicitly act as though there were one. Similar things could have also been said about the law of large numbers; however, it is likely that Bernoulli too would have disagreed.
3.2 DE FINETTI-STYLE THEOREMS FOR INFINITE SEQUENCES OF
INFINITE SEQUENCES OF NON-BINARY RANDOM QUANTITIES 51 (cf. Diaconis and Freedman, 1979). All the same, the general approach is still in the style of a de Finetti theorem; that is, to characterize models in terms of an ‘invariance’ property. By invariance, I mean ‘equiprobable’ or ‘indifference’, just like what we have done in the case of exchangeable zero-one sequences. The idea is to begin with observables, postulate symmetries or summary statistics (section 3.2.1), and then find a simple description of all models with the given symmetries.
The aim of this section is to explore the above symmetries and to produce versions of (2.5) other than that of the mixture of Bernoulli sequences. Before we do this, we must first gain additional insight about what we have already done with zero-one exchangeable sequences, particularly an insight into sufficiency and invariance. Informally speaking, by ‘sufficiency’
I mean a summarization of information in a manner that essentially preserves some needed characteristics.
3.2.1 Sufficiency and Indifference in Zero-one Exchangeable Sequences
We begin by obtaining an equivalent formulation of an exchangeable, infinitely extendible, zero-one sequenceX1 X2 . Define the partial sumSn=X1+X2+ · · · +Xn, and letSn=t.
Then, givent, exchangeability implies that the sequenceX1 Xnis uniformly distributed over then
t
sequences havingt ones andn−tzeros. That is, each of then
t
sequences has probability 1/n
t
. But infinitely exchangeable zero-one sequences are, by de Finetti’s theorem, mixtures of Bernoulli sequences. Thus we have the following:
Indifference Principle for Zero-one Sequences
An infinite sequence of zero-one variables can be represented as a mixture of Bernoulli sequences if and only if, for each n, givenSn=t, the sequence X1 Xnis ‘uniformly’ distributed over then
t
sequences havingtones andn−tzeros.
This equivalence formulation for the exchangeability of zero-one sequences of infinite length paves the way for questions about other kinds of sequences (section 3.3.2), but it also brings to surface two points: the relevance of the sufficiency of the partial sumsSn, and the uniform distribution of the sequence X1 Xn. In other words, if zero-one sequences are judged invariant (i.e. uniformly distributed) under permutation given the sufficient statistic Sn, then they are also exchangeable, and exchangeability, in turn, gives us the mixture representation.
The ‘indifference principle’ refers to the act of judging invariance (under permutation in the present case).
3.2.2 Invariance Conditions Leading to Mixtures of Other Distributions
Prompted by the observation that mixtures of Bernoulli sequences arise when we assume invari- ance under permutation, we look for conditions (i.e. judgments about observables) that lead to mixtures of other well-known chance distributions that commonly arise in practice. This may enable us to better interpret specific forms of (2.5) that are used. In what follows, some of the classical as well as newer results on this topic are summarized. They pertain to mixtures of both discrete and continuous distributions. We start off with one of the best known distribution, the Gaussian (known as the ‘normal distribution’).
Scale Mixtures of Gaussian Distributions
When can a sequence of real valued random quantitiesXi1≤i <, be represented as a scale mixture of Gaussian0 2variables (or chance distributions)? That is, when is
PX1≤x1 Xn≤xn= 0
n i=1
xi Fd (3.3)
where,PX1≤x1=x=1/
2x
−e−u2/2du, is theGaussian(0,1) distribution function, andF a unique probability distribution on0?
Freedman (1962) (also Schoenberg (1938), and Kingman (1972)) has shown that a necessary and sufficient condition for the above to hold is that, for each n, the joint distribution of X1 Xn be rotationally symmetric. Note that an n+1vector of random quantities, say X=X1 Xn, is rotationally symmetric (or spherically symmetric, or orthogonally invariant) if the joint distribution ofXis identical to that ofMX for allm×northogonal matricesM.
An equivalent characterization based on sufficiency and invariance (cf. Diaconis and Freedman, 1981) is that, for everyn, given the sufficient statisticn
i=1Xi21/2=t, the joint distribution of X1 Xnshould be ‘uniform’ on then−1sphere of radiustinn. Note that ann×1 vector of random quantitiesXhas a uniform distribution on the unitnsphere if thei-th element of the vector is defined as
Xi= Ui
Ui2+ · · · +Un21/2 i=1 n
where theUis are independent and identically distributed as the Gaussian (0, 1) distribution.
Location Mixtures of Gaussian Distributions
When can a sequence of real valued random quantitiesXi 1≤i≤ , be represented as a location mixture of Gaussian 2variables with2known? That is, when is
PX1≤x1 Xn≤xn= 0
n i=1
xi Fd (3.4)
wherePX≤x= x=1/
2x
−e−u−2/2duis the Gaussian1distribution func- tion?
The necessary and sufficient conditions (cf. Diaconis and Freedman, 1981) for the above to hold are the following:
(i) X1 X2 is an exchangeable sequence, and
(ii) GivenX1+ · · · +Xn=t, the joint distribution ofX1 Xnis a multivariate Gaussian distribution (Anderson, 1958; pp. 11–19) with mean vectort/nand covariance matrix with all diagonal terms2and off diagonal terms2/n−1.
Location and Scale Mixtures of Gaussian Distributions
When can a sequence of real-valued random quantitiesXi1≤i≤ , be presented as a mixture of Gaussian 2variables? That is, when is
PX1≤x1 Xn≤xn= 0
n i=1
xi Fdd (3.5)
wherePX≤x= x=1/
2x
−e−u−2/2duis the Gaussian1distribution func- tion?
Smith (1981) has shown that a necessary and sufficient condition for the above to hold is that, for everyn, given the two sufficient statisticsUn=X1+ · · · +Xn andVn=X21+ · · · +X2n1/2, the joint distribution ofX1 Xnis ‘uniform’ over then−2– sphere inn with center atUnand radiusVn.
Problems involving location and scale mixtures of Gaussian distributions are often encountered in many applications of Bayesian statistics, though not necessarily in reliability and survival
INFINITE SEQUENCES OF NON-BINARY RANDOM QUANTITIES 53 analysis. The result given here is important because it shows that, whenever we take location and scale mixtures of a Gaussian distribution, we are de facto making a judgment of indifference of the type shown above.
Mixtures of Uniform Distributions
When isX1 X2 a mixture overof sequences of independent uniform0 variables? That is, when is
PX1≤x1 Xn≤xn= 0
n i=1
xi Fd? (3.6)
The necessary and sufficient condition is that, for every n, given Mn=maxX1 Xn, the Xis are independent and ‘uniform’ over0 Mn(Diaconis and Freedman, 1981).
This elementary result provides an opportunity to illustrate the practical value of the de Finetti- style theorems we are discussing here. To see this, suppose that theXis represent the lifetimes of items, and we are to be given onlyMn, the largest of nlifetimes. If (upon receiving this information) our judgment about the othern−1lifetimes was that each could be anywhere between 0 andMnwith equal probability, and if the knowledge about any other lifetime did not change this judgment, then a mixture overof uniform variables would be a suitable model for thenlifetimes.
Mixtures of Poisson Distributions
Let Xi 1≤i≤ , take integer values. Freedman (1962) has shown that a necessary and a sufficient condition for theXis to have a representation as a mixture of Poissonvariables, i.e. as
PX1=x1 Xn=xn= 0
n i=1
e−xi
xi! Fd (3.7)
is that for everyn, the joint distribution ofX1 Xn, givenSn=n
i=1Xi, is a multinomial on n-tuples of nonnegative integers whose sum isSn with ‘uniform’ probabilities1/n 1/n.
That is, for any integer valuedai i=12 n, such thatSn=n i=1ai, PX1=a1 Xn=anSn= Sn!
a1! × ×an! n i=1
1/nai (3.8)
The distribution (3.8) given above is that ofSnballs dropped at random intonboxes, and is also known as the ‘Maxwell–Boltzman’ distribution.
Much of the more recent work in survival analysis (cf. Andersenet al., 1993) and in reliability (cf. Asher and Fiengold, 1984) deals with count data. Modeling counts by point process models, like the Poisson, has turned out to be quite useful. The Bayesian analysis of a Poisson process model would involve expressions like (3.7). For example, Campodónico and Singpurwalla (1995) analyze the number of fatigue defects in railroad tracks and encounter a version of (3.7). The result above says that if we are to use the mixture model (3.7), we must be prepared to make the judgment implied by (3.8). Alternatively, we may start off in the true spirit of a de Finetti- style theorem and first elicit expert judgment on counts; if this happens to be of the form (3.8), we must use the mixture model (3.7).
Mixtures of Geometric Distributions
Let Xi 1≤i≤ , take integer values. Diaconis and Freedman (1981) have shown that a necessary and a sufficient condition for theXis to have a representation as a mixture of geometric variables, i.e.
PX1=x1 Xn=xn= 1 0
n
i=11−xi−1Fd (3.9) is that for everyn, the joint distribution of X1 Xn, givenSn=n
i=1Xi, is a ‘uniform’
distribution over all nonnegativen-tuples of integers whose sum isSn. That is, PX1=x1 Xn=xnSn=1
k (3.10)
where k is the total number of n-tuples whose sum is Sn. For example, if Sn=1, then the total number ofn-tuples whose sum is 1 is n; i.e. k=n. The distribution (3.10) is called the
‘Bose–Einstein’ distribution.
Mixtures of Exponential Distributions
When isX1 Xna mixture overof sequences of independent exponential chance variables with parameter? That is, when is
PX1≤x1 Xn≤xn= 0
n i=1
1−e−xi/Fd (3.11)
wherePX≤x=1−e−x/, an exponential distribution function?
A necessary and sufficient condition (cf. Diaconis and Freedman, 1987) for the above to hold is that, for eachn, given the sufficient statisticSn=n
i=1Xi, the joint distribution ofX1 Xn is uniform on the simplexXi≥0 andSn.
The exponential chance distribution is one of the most frequently used failure models in reliability and survival analysis. Its popularity is attributed to its simplicity, and also to the fact that it characterizes non-aging or lack of wear; that is, its failure rate (section 4.3) is a constant function of time. In the mathematical theory of reliability, its importance stems from the fact that the exponential distribution function provides bounds on the distribution function of a large family of failure models (cf. Barlow and Proschan, 1965). Because of this central role, many papers dealing with a Bayesian analysis of the exponential failure model have been written.
Much of this focuses on suitable choices forFd. Martz and Waller (1982) is a convenient source of reference. The starting point for all such analyses are specific versions of (3.11).
The de Finetti-style result given above says that underlying the use of (3.11) is a judgment of indifference, namely that, were we given only the sum of thenlifetimes, we would judge all the lifetimes to be equiprobable over a region which is defined by a simplex. Alternatively, we could have (in the spirit of de Finetti) started with the judgment of indifference, presumably provided by an expert’s opinion, and then would be lead to (3.11). This would be a different motivation for using the exponential failure model.
Mixtures of Gamma and Weibull Distributions
Barlow and Mendel (1992) describe conditions under which X1 X2 would be a mixture of gamma chance distributions. Starting with a finite population of N items with lifetimes Xi i=1 N, and guided by the view that the easiest way to make probability judgments is via the principle of ‘insufficient reason’, they assume indifference of Wis (the transformed values of the Xis) on the simplex
Xi≥0 N i=1Xi
withN
i=1Xi known. In the special case
ERROR BOUNDS FOR FINITE SEQUENCES OF RANDOM QUANTITIES 55 whenWi=Xi and limN→N
i=1Xi/N=, they obtain a mixture of gamma distributions with shape and scale /. The density function atxof a gamma-distributed variableXhaving scale and shape is given as exp−x x −1/ x≥0, where•is the gamma function.
Thus, parameters of failure models are functions of observable lifetimes. Mixtures of Weibull chance distributions with shape and scalearise when all of the above hold except that the N
i=1Xiin the simplex is replaced byN
i=1Xi. A random quantityXis said to have aWeibull distributionwith shape and scaleifPX≥x =exp−x , forx≥0.
Results that are analogous to the above, but pertaining to mixtures of the inverse binomial and the binomial are given by Freedman (1962). I think that these results are important to subjective Bayesians working on practical problems, because they provide a foundation for the starting point of their work, namely their choice of a model; also see Spizzichino (1988). All of the above results explain the consequences of the judgment of indifference on observables, given a statistic. In practice, since it is much easier to assess and to agree upon a ‘uniform’ distribution than upon any other distribution, indifference plays a key role. For this reason, I have placed the word ‘uniform’ within quotes.
3.3 ERROR BOUNDS ON DE FINETTI-STYLE RESULTS FOR FINITE