DE FINETTI-STYLE THEOREMS FOR INFINITE SEQUENCES OF NON-BINARY RANDOM QUANTITIES

Exchangeability and Indifference

3.2 DE FINETTI-STYLE THEOREMS FOR INFINITE SEQUENCES OF NON-BINARY RANDOM QUANTITIES

sameas the chance of observing a 1), Bernoulli proved that the relative frequency with which a 1 occurs approximates. Stated formally, ifS_mis the total number of 1s that occur in a large number, saym, of Bernoulli trials with a parameter, then according to theweak law of large numbers(for Bernoulli trials),

P S_m

m − <

→1 for >0 arbitrarily small but fixed.

Recall that, even to Bernoulli, a relative frequency was not a probability. Thus bothand the probability statement of the theorem were regarded by Bernoulli as ‘ease of happening’, not relative frequencies.The strong law of large numbers(first stated by Cantelli in 1917) asserts that, with probability 1,Sm/m−becomes small and remains small.

Relative Frequency of Exchangeable Bernoulli Sequences

After proving his theorem for exchangeable sequences, de Finetti went on to prove a second result which is of great importance. In what follows, I shall adhere to the notation used in the infinite version of de Finetti’s theorem. He proved that, for exchangeable sequences, the (personal) probability that the limm→k/m exists is 1 and, if this limit were to be denoted by , then the (personal) probability distribution for is F. If F is such that all its mass is concentrated on a single value, then the weak law of large numbers follows as a special case of de Finetti’s theorem.

It is important to note that again is not to be interpreted as a probability, because to a subjectivist a probability is the amount you are willing to bet on the occurrence of an uncertain outcome. Lindley and Phillips (1976) viewas a property of the real world and refer to it as a propensityor achance, and the quantity^k1−ⁿ⁻^kas achance distribution. Some authors on subjective probability, such as Kyburg and Smokler (1964, pp. 13–14), have suggested that the above result of de Finetti bridges the gap between personal probability and physical probability.

However, de Finetti (1976) disagrees, and Diaconis and Freedman (1979) also have trouble with this type of synthesis. Hill (1993) attempts to clarify this misunderstanding by saying that even though there does not pre-exist a ‘true probability’, one could implicitly act as though there were one. Similar things could have also been said about the law of large numbers; however, it is likely that Bernoulli too would have disagreed.

3.2 DE FINETTI-STYLE THEOREMS FOR INFINITE SEQUENCES OF

INFINITE SEQUENCES OF NON-BINARY RANDOM QUANTITIES 51 (cf. Diaconis and Freedman, 1979). All the same, the general approach is still in the style of a de Finetti theorem; that is, to characterize models in terms of an ‘invariance’ property. By invariance, I mean ‘equiprobable’ or ‘indifference’, just like what we have done in the case of exchangeable zero-one sequences. The idea is to begin with observables, postulate symmetries or summary statistics (section 3.2.1), and then find a simple description of all models with the given symmetries.

The aim of this section is to explore the above symmetries and to produce versions of (2.5) other than that of the mixture of Bernoulli sequences. Before we do this, we must first gain additional insight about what we have already done with zero-one exchangeable sequences, particularly an insight into sufficiency and invariance. Informally speaking, by ‘sufficiency’

I mean a summarization of information in a manner that essentially preserves some needed characteristics.

3.2.1 Sufficiency and Indifference in Zero-one Exchangeable Sequences

We begin by obtaining an equivalent formulation of an exchangeable, infinitely extendible, zero-one sequenceX₁ X₂ . Define the partial sumS_n=X₁+X₂+ · · · +X_n, and letS_n=t.

Then, givent, exchangeability implies that the sequenceX₁ X_nis uniformly distributed over the_n

sequences havingt ones andn−tzeros. That is, each of the_n

sequences has probability 1/_n

. But infinitely exchangeable zero-one sequences are, by de Finetti’s theorem, mixtures of Bernoulli sequences. Thus we have the following:

Indifference Principle for Zero-one Sequences

An infinite sequence of zero-one variables can be represented as a mixture of Bernoulli sequences if and only if, for each n, givenS_n=t, the sequence X₁ X_nis ‘uniformly’ distributed over the_n

sequences havingtones andn−tzeros.

This equivalence formulation for the exchangeability of zero-one sequences of infinite length paves the way for questions about other kinds of sequences (section 3.3.2), but it also brings to surface two points: the relevance of the sufficiency of the partial sumsSn, and the uniform distribution of the sequence X1 Xn. In other words, if zero-one sequences are judged invariant (i.e. uniformly distributed) under permutation given the sufficient statistic Sn, then they are also exchangeable, and exchangeability, in turn, gives us the mixture representation.

The ‘indifference principle’ refers to the act of judging invariance (under permutation in the present case).

3.2.2 Invariance Conditions Leading to Mixtures of Other Distributions

Prompted by the observation that mixtures of Bernoulli sequences arise when we assume invariance under permutation, we look for conditions (i.e. judgments about observables) that lead to mixtures of other well-known chance distributions that commonly arise in practice. This may enable us to better interpret specific forms of (2.5) that are used. In what follows, some of the classical as well as newer results on this topic are summarized. They pertain to mixtures of both discrete and continuous distributions. We start off with one of the best known distribution, the Gaussian (known as the ‘normal distribution’).

Scale Mixtures of Gaussian Distributions

When can a sequence of real valued random quantitiesX_i1≤i <, be represented as a scale mixture of Gaussian0 ²variables (or chance distributions)? That is, when is

PX₁≤x₁ X_n≤x_n= 0

n i=1

x_i Fd (3.3)

where,PX₁≤x₁=x=1/

−e^−u²^/2du, is theGaussian(0,1) distribution function, andF a unique probability distribution on0?

Freedman (1962) (also Schoenberg (1938), and Kingman (1972)) has shown that a necessary and sufficient condition for the above to hold is that, for each n, the joint distribution of X₁ X_n be rotationally symmetric. Note that an n+1vector of random quantities, say X=X₁ X_n, is rotationally symmetric (or spherically symmetric, or orthogonally invariant) if the joint distribution ofXis identical to that ofMX for allm×northogonal matricesM.

An equivalent characterization based on sufficiency and invariance (cf. Diaconis and Freedman, 1981) is that, for everyn, given the sufficient statisticn

i=1X_i²^1/2=t, the joint distribution of X₁ X_nshould be ‘uniform’ on then−1sphere of radiustinⁿ. Note that ann×1 vector of random quantitiesXhas a uniform distribution on the unitnsphere if thei-th element of the vector is defined as

Xi= Ui

U_i²+ · · · +U_n²^1/2 i=1 n

where theU_is are independent and identically distributed as the Gaussian (0, 1) distribution.

Location Mixtures of Gaussian Distributions

When can a sequence of real valued random quantitiesX_i 1≤i≤ , be represented as a location mixture of Gaussian ²variables with²known? That is, when is

PX1≤x1 Xn≤xn= 0

n i=1

xi Fd (3.4)

wherePX≤x= x=1/

−e^−u−²^/2duis the Gaussian1distribution function?

The necessary and sufficient conditions (cf. Diaconis and Freedman, 1981) for the above to hold are the following:

(i) X₁ X₂ is an exchangeable sequence, and

(ii) GivenX₁+ · · · +X_n=t, the joint distribution ofX₁ X_nis a multivariate Gaussian distribution (Anderson, 1958; pp. 11–19) with mean vectort/nand covariance matrix with all diagonal terms²and off diagonal terms²/n−1.

Location and Scale Mixtures of Gaussian Distributions

When can a sequence of real-valued random quantitiesX_i1≤i≤ , be presented as a mixture of Gaussian ²variables? That is, when is

PX₁≤x₁ X_n≤x_n= 0

n i=1

x_i Fdd (3.5)

wherePX≤x= x=1/

−e⁻^u⁻²^/2duis the Gaussian1distribution function?

Smith (1981) has shown that a necessary and sufficient condition for the above to hold is that, for everyn, given the two sufficient statisticsU_n=X₁+ · · · +X_n andV_n=X²₁+ · · · +X²_n^1/2, the joint distribution ofX1 Xnis ‘uniform’ over then−2– sphere inⁿ with center atUnand radiusVn.

Problems involving location and scale mixtures of Gaussian distributions are often encountered in many applications of Bayesian statistics, though not necessarily in reliability and survival

INFINITE SEQUENCES OF NON-BINARY RANDOM QUANTITIES 53 analysis. The result given here is important because it shows that, whenever we take location and scale mixtures of a Gaussian distribution, we are de facto making a judgment of indifference of the type shown above.

Mixtures of Uniform Distributions

When isX₁ X₂ a mixture overof sequences of independent uniform0 variables? That is, when is

PX₁≤x₁ X_n≤x_n= 0

n i=1

x_i Fd? (3.6)

The necessary and sufficient condition is that, for every n, given M_n=maxX₁ X_n, the X_is are independent and ‘uniform’ over0 M_n(Diaconis and Freedman, 1981).

This elementary result provides an opportunity to illustrate the practical value of the de Finetti- style theorems we are discussing here. To see this, suppose that theXis represent the lifetimes of items, and we are to be given onlyM_n, the largest of nlifetimes. If (upon receiving this information) our judgment about the othern−1lifetimes was that each could be anywhere between 0 andM_nwith equal probability, and if the knowledge about any other lifetime did not change this judgment, then a mixture overof uniform variables would be a suitable model for thenlifetimes.

Mixtures of Poisson Distributions

Let X_i 1≤i≤ , take integer values. Freedman (1962) has shown that a necessary and a sufficient condition for theXis to have a representation as a mixture of Poissonvariables, i.e. as

PX1=x1 Xn=xn= 0

n i=1

e⁻^xⁱ

xi! Fd (3.7)

is that for everyn, the joint distribution ofX₁ X_n, givenS_n=n

i=1X_i, is a multinomial on n-tuples of nonnegative integers whose sum isS_n with ‘uniform’ probabilities1/n 1/n.

That is, for any integer valueda_i i=12 n, such thatS_n=n i=1a_i, PX₁=a₁ X_n=a_nS_n= S_n!

a₁! × ×a_n! n i=1

1/n^aⁱ (3.8)

The distribution (3.8) given above is that ofS_nballs dropped at random intonboxes, and is also known as the ‘Maxwell–Boltzman’ distribution.

Much of the more recent work in survival analysis (cf. Andersenet al., 1993) and in reliability (cf. Asher and Fiengold, 1984) deals with count data. Modeling counts by point process models, like the Poisson, has turned out to be quite useful. The Bayesian analysis of a Poisson process model would involve expressions like (3.7). For example, Campodónico and Singpurwalla (1995) analyze the number of fatigue defects in railroad tracks and encounter a version of (3.7). The result above says that if we are to use the mixture model (3.7), we must be prepared to make the judgment implied by (3.8). Alternatively, we may start off in the true spirit of a de Finetti- style theorem and first elicit expert judgment on counts; if this happens to be of the form (3.8), we must use the mixture model (3.7).

Mixtures of Geometric Distributions

Let Xi 1≤i≤ , take integer values. Diaconis and Freedman (1981) have shown that a necessary and a sufficient condition for theXis to have a representation as a mixture of geometric variables, i.e.

PX₁=x₁ X_n=x_n= 1 0

i=11−^xⁱ⁻¹Fd (3.9) is that for everyn, the joint distribution of X1 Xn, givenSn=n

i=1Xi, is a ‘uniform’

distribution over all nonnegativen-tuples of integers whose sum isS_n. That is, PX1=x1 Xn=xnSn=1

k (3.10)

where k is the total number of n-tuples whose sum is Sn. For example, if Sn=1, then the total number ofn-tuples whose sum is 1 is n; i.e. k=n. The distribution (3.10) is called the

‘Bose–Einstein’ distribution.

Mixtures of Exponential Distributions

When isX₁ X_na mixture overof sequences of independent exponential chance variables with parameter? That is, when is

PX₁≤x₁ X_n≤x_n= 0

n i=1

1−e⁻^xⁱ^/Fd (3.11)

wherePX≤x=1−e^−x/, an exponential distribution function?

A necessary and sufficient condition (cf. Diaconis and Freedman, 1987) for the above to hold is that, for eachn, given the sufficient statisticS_n=n

i=1X_i, the joint distribution ofX₁ X_n is uniform on the simplexX_i≥0 andS_n.

The exponential chance distribution is one of the most frequently used failure models in reliability and survival analysis. Its popularity is attributed to its simplicity, and also to the fact that it characterizes non-aging or lack of wear; that is, its failure rate (section 4.3) is a constant function of time. In the mathematical theory of reliability, its importance stems from the fact that the exponential distribution function provides bounds on the distribution function of a large family of failure models (cf. Barlow and Proschan, 1965). Because of this central role, many papers dealing with a Bayesian analysis of the exponential failure model have been written.

Much of this focuses on suitable choices forFd. Martz and Waller (1982) is a convenient source of reference. The starting point for all such analyses are specific versions of (3.11).

The de Finetti-style result given above says that underlying the use of (3.11) is a judgment of indifference, namely that, were we given only the sum of thenlifetimes, we would judge all the lifetimes to be equiprobable over a region which is defined by a simplex. Alternatively, we could have (in the spirit of de Finetti) started with the judgment of indifference, presumably provided by an expert’s opinion, and then would be lead to (3.11). This would be a different motivation for using the exponential failure model.

Mixtures of Gamma and Weibull Distributions

Barlow and Mendel (1992) describe conditions under which X1 X2 would be a mixture of gamma chance distributions. Starting with a finite population of N items with lifetimes Xi i=1 N, and guided by the view that the easiest way to make probability judgments is via the principle of ‘insufficient reason’, they assume indifference of W_is (the transformed values of the X_is) on the simplex

X_i≥0 N i=1X_i

withN

i=1X_i known. In the special case

ERROR BOUNDS FOR FINITE SEQUENCES OF RANDOM QUANTITIES 55 whenW_i=X_i and lim_N_→N

i=1X_i/N=, they obtain a mixture of gamma distributions with shape and scale /. The density function atxof a gamma-distributed variableXhaving scale and shape is given as exp−x x⁻¹/ x≥0, where•is the gamma function.

Thus, parameters of failure models are functions of observable lifetimes. Mixtures of Weibull chance distributions with shape and scalearise when all of the above hold except that the N

i=1X_iin the simplex is replaced byN

i=1X_i. A random quantityXis said to have aWeibull distributionwith shape and scaleifPX≥x =exp−x , forx≥0.

Results that are analogous to the above, but pertaining to mixtures of the inverse binomial and the binomial are given by Freedman (1962). I think that these results are important to subjective Bayesians working on practical problems, because they provide a foundation for the starting point of their work, namely their choice of a model; also see Spizzichino (1988). All of the above results explain the consequences of the judgment of indifference on observables, given a statistic. In practice, since it is much easier to assess and to agree upon a ‘uniform’ distribution than upon any other distribution, indifference plays a key role. For this reason, I have placed the word ‘uniform’ within quotes.

3.3 ERROR BOUNDS ON DE FINETTI-STYLE RESULTS FOR FINITE

Dalam dokumen Reliability and Risk (A Bayesian Perspective) (Halaman 69-74)