Exchangeability and Indifference
3.1 INTRODUCTION TO EXCHANGEABILITY: DE FINETTI’S THEOREM
Chapter 3
3.1.1 Motivation for the Judgment of Exchangeability
A need for the judgment of exchangeability is motivated by restricting attention to a collection of random events. Consider the following scenario: A subject is administered a drug today; at the end of the month, the subject is to be observed for its ‘response’ or ‘non-response’. The event
‘response’ is denoted by 1, and the event ‘non-response’ by 0. Suppose that 10 such subjects, all judged to be similar to each other, are administered the drug and our interest today focuses on the 210=1024 possible outcomes that can occur at the end of the month. What can we say about the probability of occurrence of each of these 1024 outcomes? Our scenario is generic, and applies equally well to many other situations ranging from coin tossing to acceptance sampling and life-testing.
From a personalistic point of view, we should be able to think hard about the subjects and the drug and coherently assign probabilities to each of the 1024 outcomes. There are no restrictions on how we make our assessments, the only requirement being that the assigned probabilities must add up to one. Even with so much latitude, it is a difficult task to come up with a sensible assignment of so many probabilities. de Finetti suggested exchangeability as a way to simplify this situation. Specifically, his idea was that all sequences of length 10, comprising of 1s and 0s, be assigned the same probability if they have the same total number of 1s. That is, all sequences that have a total of five 1s should be assigned the same probability; similarly, all sequences of eight 1s. This simply means that the probability assignment is ‘symmetric’, or
‘invariant’, or ‘indifferent’ under the changes in order. Thus, as far as the probability assignment is concerned, what matters in a sequence is the total number of 1s and not their position and order in the sequence. If exchangeability is to be believed, then the total number of probabilities to be assigned reduces from 1024 to 11. This simplification has been possible because we have made a judgment. Our judgment is that all the ten subjects are similar or indistinguishable, so that it does not matter to us which particular subject produces a response; all that matters to us is how many of the ten subjects responded to the drug. Thus exchangeability is a subjective judgment that we have made in order to simplify our assignment of probabilities. A precursor to exchangeability is Laplace’s principle of ‘insufficient reason’. In actuality, judgments of exchangeability should be supported by the science of the problem, and there could be situations where exchangeability cannot be assumed. To alleviate concerns such as these, de Finetti (1938) generalized his idea of exchangeability and introduced the notion of partial exchangeability. This is a topic that I do not plan to address, save for the material of section 6.2.1 wherein the simulation of exchangeable sequences is discussed. We are now ready to define exchangeability.
Exchangeability
Random quantities X1 X2 Xn, discrete or continuous, are said to be exchangeable if their n! permutations, say Xk1 Xkn have the same n-dimensional probability distri- bution PX1 Xn. The elements of an infinite sequence X1 X2 are exchangeable, if X1 X2 Xm are exchangeable for each m. The n-dimensional probability distribution PX1 Xnis said to be exchangeable if it is invariant under a permutation of the coordinates.
3.1.2 Relationship between Independence and Exchangeability
It has been pointed out that exchangeability was introduced by de Finetti as a way of simplifying the assignment of probabilities. Are there other attributes of exchangeability that make it attractive for use? Is there a relationship between independence and exchangeability? Does independence imply exchangeability or is it the other way around? These are some of the questions that arise, and the aim of this section is to attempt to answer them. We start off by considering the following situation which commonly occurs in life-testing and industrial quality control.
INTRODUCTION TO EXCHANGEABILITY: DE FINETTI’S THEOREM 47 Several identically manufactured items are tested for success or failure, with the uncertain eventXitaking the value one if thei-th item tests successfully, and zero otherwise. Given the results of the test, interest generally centers around either accepting or rejecting a large lot from which these items have been selected at random.
A frequentist approach to a problem of this type generally starts with the assumption that the sequence of zero-one events X1 X2 are independent and have a common Bernoulli distribution with the parameter; i.e.PXi=1= i=12 , andPXi=0=1−. Experiments which lead to such events are called Bernoulli trials, and the sequence Xis, a Bernoulli sequence. Once this assumption is made, standard machinery can be applied, provided that the ‘stopping rule’ (cf. Lindley and Phillips, 1976) is known. For example, we can obtain point estimates and interval estimates of , test hypotheses about, and so on. Based on the outcome of such procedures, we can then decide to either accept or to reject the lot. In fact, procedures like this are so common that they have been codified for use by the United States government as MIL-STD-105D (MIL-STD-105D, 1963), and have become the standard by which many organizations, all over the world, do business.
The subjective Bayesian approach, however, proceeds differently. To a subjective Bayesian, when any of theXis, sayX1reveals itself, opinion about the uncertainty of the remainingXis, namely, X2 X3 , changes; similarly, when X2 reveals itself, opinion about the remaining Xis changes, and so on. By contrast, the assumption of independence of theXis, which implies that observingX1has no impact onX2 X3 , is unjustifiably severe. The starting point for a Bayesian analysis would therefore be something which reflects the dependence between theXis, and one such vehicle is the assumption that the zero-one sequenceX1 X2 is exchangeable.
But why would an exchangeable sequence reflect dependence? The answer was provided by de Finetti who, after introducing the notion of exchangeability, went on to prove a famous theorem that bears his name. This theorem formalizes what was said at the end of section 2.6.2 namely that the defining representation for an exchangeable sequence is an equation of the type (2.4).
That is, an exchangeable sequence can be generated by a mixture of conditionally independent identically distributed sequences.
Thus a Bayesian can proceed along one of two lines of reasoning, both leading to the same formulation. The first would be along the lines outlined in sections 2.6.1 and 2.6.2, and the second based on the judgment of exchangeability. With the first, the driving premise was the law of total probability, and the assumption that theXis are conditionally independent, conditioned on knowing. Given the parametertheXis may then be assumed to have a common probability model, which in this case is a Bernoulli with 0≤≤1. Sinceis an unknown quantity, the laws of probability require that for making statements of uncertainty about theXis we average over all values. This line of development would bring us to the following special versions of (2.5) and (2.6); namely that, forx=1 or 0,
P
Xn+1=xXn H
= 1 0
x1−1−xPXn Hd (3.1)
where
P
Xn H
∝ n i=1
xi1−1−xi P H (3.2)
andP His our prior density at. Since 0< <1 is continuous, the prior at is typically chosen to be a beta densityon (0,1); i.e. P =+ / −11− −1, whereand are parameters of this density, and >0 =
0 x −1e−xdxis thegamma function. The notion of a density will be clarified later in section 4.2.
With the second line of reasoning, a Bayesian would start with the judgment of exchange- ability, and then appeal to de Finetti’s theorem (given below) to arrive at (3.1) and (3.2). Thus exchangeability and the theorem can be viewed as providing a foundation for the Bayesian development. It is important to emphasize that, like all probability judgments, the judgment of exchangeability is made before the sequence is observed. Once all this is done, the deci- sion to accept or to reject the lot would be based on the assessed utilities and the principle of maximization of expected utilities; more details on how this can be done are given in Chapter 5.
It is easy to see, from the definition of exchangeability, that a sequence of independent random quantities is also an exchangeable sequence. However, exchangeability is a weaker condition than independence, in the sense that exchangeable random quantities are not necessarily independent.
To verify this – details left as an exercise – consider an urn containing three balls, two of which are marked ‘1’ and the remaining ball marked ‘0’. If these balls were to be drawn from the urn, one at a time without replacement, and ifXkwere to denote the digit on thek-th ball drawn, then the sequence of random quantitiesX1 X2 X3would be exchangeable, but not independent.
3.1.3 de Finetti’s Representation Theorem for Zero-one Exchangeable Sequences We have seen before that the theory of probability does not concern itself with the interpretation of probability, neither does it concern itself with how the probabilities of the elementary events are assigned. The aim of the theory is to develop methods by which the probability of compli- cated events can be obtained from the probabilities of the elementary events. Thus, as a result in probability theory, de Finetti’s theorem is striking. Controversy about it arises when in a particular application, the judgment of exchangeability is made and the theorem is then invoked to justify a Bayesian analysis. Since exchangeability as a judgment is personal, it follows therefore that any application of the theorem is liable to be the subject of debate. However, we have seen that exchange- ability as a judgment is more realistic than independence, at least for zero-one sequences. The following version of the theorem and the proof has been taken from Heath and Sudderth (1976).
Theorem 3.1 (finite form), de Finetti (1937). LetX1 Xmbe a sequence of exchangeable random quantities taking values0or1.Then,
PX1=1 Xk=1 Xk+1=0 Xn=0= m r=0
rrm−rn−k
mn
qrfor 0≤k≤n≤m whereqr=Pm
j=1 Xj=r,and xk=k−1
j=0x−j.
Proof. From the exchangeability of the sequenceX1 Xmit follows that givenm
j=1Xj=r, all possible arrangements of ther ones among the mplaces is equally likely. The situation is analogous to drawing from an urn containingr ones andm−rzeros.
Thus P
X1=1 Xk=1 Xk+1=0 Xn=0 m j=1
Xj=r
= r
k
m−r
n−k
m
n
÷ n
k and so
P X1=1 Xk=1 Xk+1=0 Xn=0=m
r=0
r
k
m−r
n−k
m
n
÷ n
k qr which, when simplified, reduces to the statement of the theorem. Note that division byn
k
is necessary because we are looking for a particular pattern of ones and zeros.
INTRODUCTION TO EXCHANGEABILITY: DE FINETTI’S THEOREM 49 The finite form of the theorem holds for all sequences of lengthmthat cannot be ‘extended’
to a length larger thanm. Thus, for example, if we are to sample without replacement from an urn withmballs in it, we cannot havem+1trials. A similar situation arises in sample surveys where we have to draw a sample from a finite population. However, in the drug testing scenario considered by us, the number of subjects to whom we can administer the drug can, in principle, be extended to a very large number. The same could also be true of the item testing scenario; if the lot size is very large, the number of items that we can test can, in principle, be very large.
For such situations, the infinite form of de Finetti’s theorem, given below, would apply.
The infinite form of de Finetti’s theorem is simply the limiting case, asm→andr/m→, of the above theorem. The proof involves convergence of distribution functions and approximations by integrals (Heath and Sudderth, 1976); it is therefore omitted.
Theorem 3.2 (infinite form), de Finetti (1937). For every exchangeable probability assign- ment that can be extended to a probability assignment on an infinite sequence of zeros and ones, there corresponds a unique probability distribution function F concentrated on [0,1] such that, for all n and0≤k≤n.
PX1=1 Xk=1 Xk+1=0 Xn=0= 1 0
k1−n−kFd
whereFd=fd,and f=ddF is the derivative of F with respect to Fis the distribution function of .
Comments on the Theorem
The infinite form of de Finetti’s theorem suggests that exchangeable probability assignments of ones and zeros which can be extended have the special form of mixtures of (the independent and identically distributed) Bernoulli trials. The mixing distributionF may be regarded as the prior for the unknown parameter. Note the similarity between the right-hand side of the statement of the theorem and the right-hand side of (3.1). The theorem also implies that for many problems, specifying a probability assignment on the number of ones and zeros in any sequence of sizen is equivalent to specifying a prior probability distribution on (0,1).
It is important to note that the infinite form of the theorem fails to hold for finite sequences.
To see this, consider the case of two exchangeable random variablesX1 andX2, withPX1= 1 X2=0=PX1=0 X2=1=1/2. ThusPX1=1 X2=1=PX1=0 X2=0=0. Now, if there exists a probability distributionF such that 0=PX1=1 X2=1=
2Fd, thenF must put mass 1 at 0, making impossiblePX1=0 X2=0=
1−2Fd=1. Thus there cannot exist a probability distributionF concentrated on (0,1) that will satisfy the statement of the theorem.
Finally, de Finetti’s result generalizes so that every sequence of exchangeable random vari- ables, not just the zero or one variables, is a mixture of sequences of independent, identically distributed variables. Thus equation (2.5), which is quite general and which was developed via a direct argument, can also be motivated via the judgment of exchangeability. However, as we shall see in section 3.2, a practical implementation of this idea for getting specific versions of (2.5) is difficult, and conditions that are more restrictive than exchangeability are sought to cut the problem to a manageable size.
3.1.4 Exchangeable Sequences and the Law of Large Numbers
The law of large numbers was mentioned in section 2.3.1 as Bernoulli’s great theorem which bound up fair price, belief and frequency. Specifically, in the context of a large sequence of Bernoulli trials (that is, random quantities whose outcome is either a 1 or a 0 and all having the
sameas the chance of observing a 1), Bernoulli proved that the relative frequency with which a 1 occurs approximates. Stated formally, ifSmis the total number of 1s that occur in a large number, saym, of Bernoulli trials with a parameter, then according to theweak law of large numbers(for Bernoulli trials),
P Sm
m − <
→1 for >0 arbitrarily small but fixed.
Recall that, even to Bernoulli, a relative frequency was not a probability. Thus bothand the probability statement of the theorem were regarded by Bernoulli as ‘ease of happening’, not relative frequencies.The strong law of large numbers(first stated by Cantelli in 1917) asserts that, with probability 1,Sm/m−becomes small and remains small.
Relative Frequency of Exchangeable Bernoulli Sequences
After proving his theorem for exchangeable sequences, de Finetti went on to prove a second result which is of great importance. In what follows, I shall adhere to the notation used in the infinite version of de Finetti’s theorem. He proved that, for exchangeable sequences, the (personal) probability that the limm→k/m exists is 1 and, if this limit were to be denoted by , then the (personal) probability distribution for is F. If F is such that all its mass is concentrated on a single value, then the weak law of large numbers follows as a special case of de Finetti’s theorem.
It is important to note that again is not to be interpreted as a probability, because to a subjectivist a probability is the amount you are willing to bet on the occurrence of an uncertain outcome. Lindley and Phillips (1976) viewas a property of the real world and refer to it as a propensityor achance, and the quantityk1−n−kas achance distribution. Some authors on subjective probability, such as Kyburg and Smokler (1964, pp. 13–14), have suggested that the above result of de Finetti bridges the gap between personal probability and physical probability.
However, de Finetti (1976) disagrees, and Diaconis and Freedman (1979) also have trouble with this type of synthesis. Hill (1993) attempts to clarify this misunderstanding by saying that even though there does not pre-exist a ‘true probability’, one could implicitly act as though there were one. Similar things could have also been said about the law of large numbers; however, it is likely that Bernoulli too would have disagreed.
3.2 DE FINETTI-STYLE THEOREMS FOR INFINITE SEQUENCES OF