Using Objective (or Default) Priors - PRIOR DISTRIBUTIONS IN CHANCE DISTRIBUTIONS

Parametric Failure Data Analysis

5.3 PRIOR DISTRIBUTIONS IN CHANCE DISTRIBUTIONS

5.3.2 Using Objective (or Default) Priors

When the prior distribution is personal to an investigator, that is when the entire distribution is elicited – as was done in section 5.3.1 – or when the parameters of a suitably chosen distribution are elicited – as was done in section 5.2.1 and 5.2.2, the priors are said to be subjective. By contrast non-subjective or objective priors are those in which the personal views of any and all investigators have no role to play. Objective priors are chosen by con- vention or through structural rules and are to be viewed as a standard of reference. Accord- ing to Dawid (1997), ‘no theory which incorporates non-subjective priors can truly be called Bayesian’.

The class of objective priors includes those that claim to be non-informative priors and those that are labeled ‘reference priors’, though some like Bernardo (1997) maintain that there is no such thing as a non-informative prior; he claims that ‘any prior reflects some form of knowledge’. Objective priors are also referred to as ‘default priors’ or ‘neutral priors’. Such priors have been introduced by us in section 2.7.3 in the context of testing a sharp null hypothesis about the mean of an exponential distribution, and in section 5.2.3 in the context of Bernoulli trials. It is often the case that objective priors suffer from impropriety, are dependent on the probability model that is assumed and could result in posteriors that are also improper (cf.

Casella, 1996). The purpose of this section is to overview the nature of such priors and to discuss arguments used to rationalize some of their unattractive features. In the sequel, I also present the kind of improper priors that have been proposed for some of the chance distributions of section 5.2.

Historical Background

Non-subjective priors were used by default in the works of Bayes (1763) and Laplace (1812).

Bayes used a uniform prior for the Bernoulli in a binomial setting, and Laplace used an improper uniform prior for the mean of a Gaussian distribution. During those times no one considered using priors that were different from the uniform. The rationale was that any prior other than the uniform would reflect specific knowledge, and in doing so would violate the principle of insufficient reason. However, in the early 1920s it was felt that the universal use of uniform priors did not make much sense. For one, a uniform prior on the Bernoulli does not imply a uniform prior on ², and this is tantamount to claiming indifference between all values of but a specific knowledge about values of². Thus the uniform distribution on is not invariant vis-à-vis the principle of insufficient reason. Since most scientists of that day were reluctant to use personal priors in their scientific work, methods alternate to Bayes’ and Laplace’s were developed, and were labeled as ‘objective methods of scientific indifference’;

frequentist statistics is one such method. A revival of the Bayes–Laplace paradigm came about in the 1940s with the publication of Jeffreys (1946), who produced an alternative to the uniform as a non-subjective prior; this prior is now known as theJeffreys’ prior.

Jeffreys’ General Rule and Jeffreys’ Priors

Jeffreys work started with a collection of rules for several scenarios, each scenario treated sepa- rately. The simplest is the case of a finite parameter space; i.e.takes values1 2 n, for n <. He adhered to the principle of insufficient reason and assigned a probability 1/nto each value of. Jeffreys then considered the case of bounded intervals so that∈a bfor constants a < b a≥0 b <. The prior density in this case was taken to be the uniform overa b. When the parameter space was the interval−+, as is the case for a Gaussian mean, Jeffreys advo- cated that the prior density be constant over−+. The above priors were in keeping with Bayes–Laplace tradition of insufficient reason. Note that the second prior results in impropriety, a matter that did not appear to have been of concern to Jeffreys. When the parameter space is the interval0, as is the case with the scale parameter of some well-known chance distributions like the exponential, the Gaussian and the Weibull, Jeffreys proposed the prior=1/. His justification for this choice was invariance under power transformation of the scale parameter. That is, if=², then by a change of variables technique, it is verified thatis of a similar form, namely∝1/; see Kass and Wasserman’s (1996) comprehensive review of objective priors. Motivated by considerations of invariance, Jeffreys’ 1946 paper proposes a ‘general rule’

for obtaining priors. For a probability model entailing a single parameter, the prior onis of the form

∝

E_X

− ²₂logpX 1/2

(5.24)

where the expectation ofXis with respect to the probability modelpX.

Jeffreys’ general rule, which yields Jeffreys’ priors, is not based on intervals in which the parameter lies, and could conflict with the rules based on intervals. For example, when pX is a binomial, the general rule results in p∝^−1/21−^−1/2 , whereas the rule based on intervals has p=1. Interestingly, Jeffreys adhered to the principle of insufficient reason for priors on bounded intervals and used p=1 for the Bernoulli (cf.

Geisser, 1984). When pX is the negative binomial, i.e. when X is the number of trials to the first success in Bernoulli outcomes, so that pX=1−^x⁻¹, Jeffreys’ general rule yields =⁻^1/21−⁻¹ as a prior for ; this prior is improper. The Bernoulli scenario illustrates a feature of Jeffreys’ priors, namely the dependence of the prior on the model: ^−1/21−^−1/2 for the binomial, and^−1/21−⁻¹ for the negative binomial. Since as the limit of observable zero-one sequences has a physical connotation, namely that is a chance (section 3.1.3), dependence of the prior for on the model for X makes lit- tle sense.

An extension of Jeffreys’ general rule – equation (5.24), when=₁ ₂is a vector, takes the form

∝detI^1/2 where det•denotes a determinant, andIis the matrix

I=E_X

− ²

1 2

logpX

(5.25)

Jeffreys observed that the results of this rule could also conflict with the results provided by the rule based on intervals. For example, with pX1 2 a Gaussian with mean 1 and variance²₂, the general rule gives₁ ₂∝1/²₂, whereas the rule based on intervals gives ₁ ₂∝1/₂; similarly for the lognormal with parameters₁and₂². Jeffreys addressed this problem by stating that₁ and₂ should be judged independent a priori and the general rule

PRIOR DISTRIBUTIONS IN CHANCE DISTRIBUTIONS 143 applied for₁given₂fixed and for₂given₁fixed. Thus with₂fixed, the general rule gives a uniform prior for₁and with₁fixed the general rule gives₂=1/₂. It is because of the above that many view Jeffreys’ priors as a collection of ad hoc rules.

Reference Priors: The Berger–Bernardo Method

Prompted by the above concern, and motivated by some work of Good (1966) and Lindley (1956), Bernardo (1979a) proposed a method for constructing objective priors that he labeled

‘reference priors’. Like Jeffreys’ priors, the reference priors also depend on the underlying probability model. However, the rationale for this dependence is more transparent in the case of reference priors than in the case of Jeffreys’ priors. To see why, we first note that Bernardo’s method is based on the notion of ‘missing information’, a notion that is characterized by the Kullback–Leibler distance between the posterior and the prior densities. Specifically, for a priorand its posterior xⁿ, the Kullback–Leibler distance is defined as

K_n

xⁿ

xⁿlog

xⁿ

wherexⁿ is a realization ofXⁿ=X₁ X_n, and the X_is are independent and identically distributed withpXas a probability model. The quantityK_n•is to be interpreted as the gain in information aboutprovided by the experiment that yieldsxⁿ, andK_n•=EK_n•is theexpected gain in information. The expectation is with respect to the predictive distribution ofXⁿ; i.e.

pXⁿ=

pXⁿd

more about this is said later, in section 5.5.1. Bernardo’s idea was to find thatfor which K•=lim

n→K_n•is a maximum; the rational here being thatK•is a measure of the missing information in the experiment. However, there is a problem with this scheme, in the sense that K could be infinite. To circumvent this difficulty, one finds the prior _n which maximizes K_n•, and computes a posterior based on this_n. One then finds the limit of the sequence of posteriors based on_n; denote this limit by xⁿ. Then the reference prior is that prior which by a direct application of Bayes’ law would yield xⁿ(Berger and Bernardo, 1992).

Under certain regularity conditions, the reference prior turns out to be that given by (5.24) and (5.25) for continuous parameter spaces, and a uniform prior for finite parameter spaces. Thus the aforementioned approach, now known as the Berger–Bernardo method (cf. Kass and Wasserman, 1996), provides, at least under some regularity conditions, a rationale for the construction of Jeffreys’ priors. Consequently, the reference prior for the Bernoulliunder binomial sampling is ^−1/21−^−1/2, and it is ^−1/21−⁻¹ under negative binomial sampling. For uniform on 0 , the reference prior turns out to be∝⁻¹, and the reference posterior, i.e. the posterior distribution resulting from a reference prior, is a Pareto distribution (Bernardo and Smith, 1994, p. 438).

When the chance distribution is a Weibull with scale parameterand shape parameter, the reference prior turns out to be =⁻¹, whereas Jeffreys’ rules would lead to the prior 1/(cf. Berger and Pericchi, 1996). With the exponential as a chance distribution, the parameter is one, so that=1/is both the reference and the Jeffreys’ priors.

To complete our discussion of the Weibull distribution, I cite here the work of Sun (1997) and that of Green,et al.(1994), who considered the case of a three-parameter Weibull chance distribution with a threshold parameter. Specifically,

PT > t =exp

−t−

for >0

This distribution is deemed appropriate whereT has to be logically strictly greater than zero, as is the case with diameters of tree trunks. In the context of reliability and survival analysis, a use of the three-parameter Weibull precludes the possibility of an item’s failure at conception. The authors cited above, and also Sinha and Sloan (1988), assume that the three parameters are a priori independent, and assign 1/and 1/as priors forand, respectively. The prior for is assumed to be a constant over0. Whereas the choice 1/can be justified on the grounds that it is Jeffreys’ prior, the choice 1/for the prior onis unconventional. All the same, the novelty of Greenet al.is the consideration of a Bayesian analysis of threshold parameters,in this particular case.

When the chance distribution is Poisson, then reference priors for the mean value function of the underlying Poisson process can be induced from the reference priors for the parameters of the distribution of the time to occurrence of the first event in the Poisson process. Thus, for example, if the underlying Poisson process is homogenous with a mean value functiont=t/, then PT > t=exp−t/, and the reference prior ontinduced from the reference prior on, namely, 1/, ist=1/t. Similarly, witht=exp−t/, though the calculation oftin this case would be more cumbersome.

Reference priors have several attractive features, namely, invariance under one-to-one transfor- mations, enable the data to play a dominant role in the construction of the posterior distribution and generally produce posterior distributions that are proper. More important, under some regularity conditions, the reference priors coincide with Jeffreys’ priors, the latter having been justified from many different viewpoints (question 20 of Bernardo, 1997). Their main disadvan- tage is impropriety and dependence on the underlying probability model. The latter tantamounts to a violation of the likelihood principle. On the matter of impropriety, Bernardo’s claim is that non-subjective priors should not be viewed as probability distributions. Rather, they should be viewed as positive fractions which serve to produce non-subjective posteriors via a formal use of Bayes’ law. The non-subjective posteriors, namely the reference posteriors, are to be used as a benchmark or a standard for performing a sensitivity analysis against other, possibly subjective priors; see answers to questions 7 and 23 in Bernardo (1997). It is crucial that reference posteriors are proper and their main purpose is to describe the inferential content of the data in scientific communication. Reference posteriors need not be used in betting or other scenarios of personal decision making because such posteriors may not be an honest reflection of ‘personal beliefs’.

Other Methods for Constructing Objective Priors

Kass and Wasserman (1996) describe several other methods for constructing non-subjective priors. Noteworthy among these are the maximum entropy priors of Jaynes reviewed by Zellner (1991) and by Press (1996), Zellner and Min’s (1993) maximal data information prior, priors of Chernoff (1954) and Good (1969) based on decision-theoretic arguments, priors based on game theoretic arguments of Kashyap (1971), the prior by Rissanen (1983) based on coding theory arguments, and a class of improper priors by Novick and Hall (1965) called ‘indifference priors’.

For the case of the Bernoulli, Novick and Hall’s indifference prior turns out to be1−⁻¹, which is improper but which induces a proper posterior as soon as one trial is performed. Several of the above methods boil down to being the methods of Jeffreys.

5.4 PREDICTIVE DISTRIBUTIONS INCORPORATING FAILURE DATA

Dalam dokumen Reliability and Risk (A Bayesian Perspective) (Halaman 160-163)