The Quantification of Uncertainty
2.7 TESTING HYPOTHESES: POSTERIOR ODDS AND BAYES FACTORS The statistical testing of a hypothesis is usually done to verify a theory or a claim, be it in science,
engineering, law or medicine, in light of evidence or data. We have said that reliability and risk analysis pertains to decision making under uncertainty. Then why should we be interested in the topic of testing hypotheses?
There could be many answers to the above question, but the one that immediately comes to mind pertains to the fact that associated with each action of a decision tree are consequences, and a consequence could have been entertained as the result of verifying a claim. For example, in the cholesterol drug problem of Chapter 1, we consider administering a drug because we believe that the drug has the potential of avoiding a heart attack. How did we arrive upon this belief? It is most likely that an experiment was conducted on several individuals, some of whom received the drug and some did not, and the results of this experiment provided evidence to certify the claim that the drug was effective in avoiding a heart attack. The certification of this claim could have been based on the test of a hypothesis. Similarly, the claim that the drug has minimal side effects could have been based on the test of suitable hypotheses. Other such examples of hypotheses testing in reliability and risk analyses are claims about the improved performance of an engine in a new design, claims that emergency diesel generators in nuclear power plants have a reliability that exceeds requirements (cf. Chen and Singpurwalla, 1996), claims that a piece of computer software is free of bugs (cf. Singpurwalla and Wilson, 1999) and so on. To summarize, the testing of statistical hypotheses is a part of decision making under uncertainty and as such plays an important role in reliability, risk and survival analysis. Indeed, one of the most visible exports of statistical reliability theory is the Military Standard 781-C, which is used worldwide for life-testing and acceptance sampling. Its theoretical basis is in the testing of hypothesis about the mean of an exponential distribution (cf. Montagne and Singpurwalla, 1985).
Because the testing of hypothesis is an important scientific problem, much has been written about it, starting from the days of Laplace. Lehmann (1950) gives an authoritative account of the frequentist treatment of this topic. However, Berger and Berry (1988) are critical of frequentist methods for testing hypotheses, their criticism centering upon a violation of the likelihood principle. A review article by Berger and Delampady (1987) proposes Bayesian alternatives. The book by Lee (1989), from which much of the material that follows is taken, provides a nice account of Bayesian hypothesis testing.
We begin with the following setup. Consider an unknown quantityX, discrete or continuous, and ignoring technicalities, suppose thatPX is a suitable probability model forX, where the parameter belongs to the set; i.e.∈. LetPbe our prior distribution for, assuming that is discrete. Suppose that =0∪1, with 0∩1= ∅, where∪ ∩and∅ denoting union, intersection and the empty (null) set, respectively. That is, the parameter set is partitioned into two non-overlapping sets of0 and1. Suppose now that X has revealed itself asx. The problem that we wish to entertain is that, givenxand, does∈0 or does ∈1? In the context of any particular application,∈0could correspond to the validity of a claim, and∈1, its negation, i.e. the falsity of the claim.
The premise behind a Bayesian approach to testing a hypothesis is that it is unreasonable, except in rare situations, thatxgives us conclusive evidence about the disposition of. However, x could give us evidence that enhances our prior opinion about H0, the null hypothesis, that ∈0, or aboutH1, the alternate hypothesis, that∈1. Bayes’ law enables us to incorporate the evidence provided byx. To see how, let0=P∈0and1=1−0be our prior probabilities, and consider the quantity0/1, which is our prior odds on H0 against H1; the prior probabilities are obtained via our prior distributionP. The notion of odds is useful because if the prior odds is close to one, we regardH0andH1to be equally likely, whereas if the ratio is large, then we regardH0to be more likely thanH1; vice versa if the ratio is small.
Givenx, Bayes’ law is used to compute the posterior probabilities p0=P∈0 x∝ ∈0 x0, and p1=P∈1 x∝∈1 x1, where ∈0 x is the likelihood that∈0, in light ofx and. The posterior oddsp0/p1 are analogously interpreted as the prior odds. It is easy to verify that
p0
p1=∈0 x ∈1 x
0 1
The above development was initiated by Jeffreys in the 1920s (Jeffreys, 1961) as an approach to testing hypotheses.
2.7.1 Bayes Factors: Weight of Evidence and Change in Odds The absence of evidence is not evidence of absence.
Simple Hypotheses
Suppose that the two hypotheses are simple; i.e.0=0and1=1, for some singletons 0=1. Then the posterior odds onH0againstH1will be of the form
p0
p1=0 x 1 x
0 1
implying that the posterior odds are the prior odds multiplied by the middle term, which is called the Bayes factorin favor ofH0againstH1. Thusis simply the ratio of the likelihoods under H0andH1. Alternatively,
TESTING HYPOTHESES: POSTERIOR ODDS AND BAYES FACTORS 29
= p0/p1
0/1=Posterior odds (onH0againstH1 Prior odds (onH0againstH1
this terminology is due to Good (1950), who attributed the method to Turing, in addition to and independently of Jeffreys.
If,0=1=1/2, then0/1=1, and now=p0/p1, the posterior odds onH0againstH1. Sinceis the ratio of likelihoods (when the hypotheses are simple), the Bayes factor gives the odds onH0againstH1, in light of the datax.
Composite Hypotheses
When is continuous, one (or both) of the two hypotheses H0 and H1 will be composite.
Composite hypotheses are of interest in reliability and life-testing whenever concerns center around items satisfying requirements, like the mean time to failure should exceed a specified number, or the failure rate should not exceed a specified number, and so on. Indeed MIL-STD- 781C mentioned before pertains to the testing of composite hypotheses.
Suppose that is continuous and let fdenote its probability density at. Then, the prior probabilities0and1, mentioned before, are:
0=
∈0
fd and1=
∈1
fd
Let0and1denote the restriction offon0and1, respectively, re-normalized so that they are probability density functions. That is,
0=f
0 for∈0 and
1=f
1 for∈1 Givenx, the posterior probability ofH0is
p0=P∈0 x∝
∈0 x fd
=
∈0 x 00d
Similarly,
p1∝
∈1 x11d so that the posterior odds onH0againstH1is
p0 p1= 0
1
∈0 x0d
∈1 x1d
Thus in the case of composite hypotheses, the Bayes factorin favor ofH0againstH1is = p0/p1
0/1=
∈0 x0d
∈1 x1d
The above suggests that, in the case of composite hypotheses, the Bayes factor is the ratio ofweighted likelihoods, with weights0 and1. This is in contrast to the structure of the Bayes factor when the hypotheses are simple; there,was solely determined by the data (and the assumed model) irrespective of the prior. In the composite hypotheses case, the prior enters into the construction of the Bayes factor via the weights0and1; it is not solely determined by the data.
Point (or Sharp) Null Hypotheses
Point null hypotheses are particularly useful when one wishes to test a theory or a claim, such as ‘aspirin cures headaches’. Other examples were given at the beginning of this section. Point null hypotheses are characterized by the fact that even though can take continuous values, 0 is simple (say 0=0, and 1 is the complement of 0. The prior distribution of f, is such that a point mass 0≥0 is assigned to 0 and the rest, 1=1−0, is spread over the remaining values of =0 according to a probability density function 11, where 1 integrates to one. It is usual to choose 0=1/2, and 1 to be uniform, so that all the values of , save 0, receive equal prior probability (cf. Lindley, 1957a).
Given x, the Bayesian approach for testing H0 =0 versus the alternative H1 =0, proceeds along the lines outlined before. However, in this case the Bayesian conclusions often differ quite substantially from those obtained by frequentist methods. This disparity has sometimes been referred to as the ‘Jeffreys Paradox’ (cf. Jeffreys, 1961) or as ‘Lindley’s Paradox’ (cf. Shafer, 1982a). The discussion by Hill (1982) in Shafer (1982a) provides insights about the nature of this paradox.
Verify that the posterior probabilities forH0andH1are:
p0= 0 x0
0 x0+1x1=0 x0 x and
p1=11x
x where
1x=
1 xdandx=0 x0+1x1 Thus, for the case of a sharp null hypothesis, the Bayes factoris of the form
= p0/p1
0/1=0 x 1x 2.7.2 Uses of the Bayes Factor
Good (1950) refers to the logarithm of the Bayes factor as the weight of evidence. His motivation for considering logarithms is that, if we have several experiments pertaining to the testing of two simple hypotheses, then the Bayes factors multiply whereas the weights of evidence add. To see how, consider the two simple hypotheses of section 2.7.1, and suppose that the observed data is x1. Then, the posterior odds corresponding tox1are given by
p0
p1
=x10
1
TESTING HYPOTHESES: POSTERIOR ODDS AND BAYES FACTORS 31 wherex1is the Bayes factor based onx1. Now suppose that subsequent to observingx1, we observex2. The posterior odds corresponding tox2will bex2 p0/p1=x2x1 0/1. The Bayes factors multiply, but by taking the logarithms of the prior and the posterior odds, the weights of evidence would add.
From the material of sections 2.7.1 and 2.7.2, we have seen that posterior odds=Bayes factor×prior odds.
The above feature has motivated some to claim that the Bayes factor is a summary of the evidence provided by the data in favor of one scientific theory (represented by a statistical model) as opposed to another (cf. Kass and Raftery, 1995). Indeed Jeffreys suggests using log10−1as evidence againstH1according to the following guidelines: (0 to 0.5)⇒not worth more than a bare mention; (0.5 to 1)⇒substantial; (1 to 2)⇒strong; and>2⇒decisive. In view of such guidelines, some Bayesians have declined to specify prior odds, and have chosen to use only Bayes factors as an alternative to the frequentist’s significance probabilities. Whereas this strat- egy may be appropriate in the case of simple hypotheses (with0=1=1/2), it is not so in the case of composite hypotheses because, here, the Bayes factor also depends on how the prior mass is spread over the two hypotheses see (section 2.7.1). In the latter case, the Bayes factor cannot be interpreted as a summary of the evidence provided by the data alone. Rather, the Bayes factor measures the change in the odds in favor of the hypothesis when going from the prior to the pos- terior; see Lindley (1997a), and Lavine and Schervish (1999) for a discussion of this and related matters. Bayes factors also play a role in the general area of model selection, model comparison and model monitoring; see the lecture notes by Chipman, George, and McCulloch (2001). Model selection and model comparison are commonly discussed issues in reliability and failure data analyses.
2.7.3 Alternatives to Bayes Factors
The foregoing discussion has assumed that the prior distributions used are proper, i.e. they integrate to one. When using Bayes factors for model choice or for tests of hypotheses, it is sometimes the case that the prior distributions used are improper. Improper distributions come into play when one wishes to adopt an impartial stance or when one claims to have little knowledge of the parameters. A consequence is that the Bayes factor contains an arbitrary ratio, sayc0/c1, where the prior isi=cihi i=01, for a knownhi, but an arbitrary, positive multiplierci. Thus, when calculating the Bayes factor theci’s do not cancel, so thatc0/c1appears.
Ways of overcoming this difficulty have been proposed by O’Hagan (1995) via his fractional Bayes factors, and by Berger and Pericchi (1996) via their intrinsic Bayes factors (IBF). A recent paper by Ghosh and Samanta (2002) provides a unified derivation of the fractional and intrinsic Bayes factors and concludes that these factors are close to each other and to certain Bayes factors based on proper priors.
It is of interest to note that when the probability model PX has a density at x of the exponential form, namely−1exp−x/, and we wish to test a point (or sharp) null hypothesis at0, the methodology of producing an IBF yields a proper prior, namelyf=0/0+2 [personal communication with James Berger]. The fact that this prior depends on0, the point at which the null hypothesis is positioned, makes 0/0+2 non-subjective, and therefore impartial to all those whose interest centers around0. In the context of estimation and prediction (section 5.3.2), this prior cannot be called objective, because using it involves anchoring it around a specific value of, namely0.
2.8 UTILITY AS PROBABILITY AND MAXIMIZATION OF EXPECTED