Preliminaries: Entropy and Information - INFORMATION FROM LIFE-TESTS: LEARNING FROM DATA

Parametric Failure Data Analysis

5.5 INFORMATION FROM LIFE-TESTS: LEARNING FROM DATA

5.5.1 Preliminaries: Entropy and Information

By the term ‘information’, we mean anything that changes our probability distribution about an unknown quantity, say, which for convenience is assumed to be a parameter of some chance distribution. A consequence of a change in the probability distribution ofcould be a change in some action that is taken based on an appreciation of. Thus one way to measure the change in

the probability distribution ofis via a utility function dT^∗ n, wheredT^∗ nis some action or decision that we may take based on dataT^∗obtained vianobservations. In the context of life-testing, the total time on test (section 5.4.3) – best encapsulatesT^∗. Recall that the notion of a utility was introduced in section 1.4, and utilities were discussed in section 2.8.1 where the notationuCijwas used to denote the utility of an actionj when the state of nature wasi. In dT^∗ n is to be identified withianddT^∗ nwithj. In what followsdT^∗ nwill be denoted by justd.

Suppose now thatFis our prior distribution of, andFT^∗ nour posterior forwere we to takenobservations and observe T^∗ as data. Then, by the principle of maximization of expected utility (section 2.8.2) theexpected gain in utilitydue toT^∗ nis measured as

gn=E_T∗

⎧⎨

⎩max

dFdT^∗ n

⎫⎬

⎭−

⎧⎨

⎩max

dFd

⎫⎬

⎭ (5.47)

it can be seen thatgn≥0. The expectation above is with respect to the marginal distribution of T^∗, andn, a decision variable, is assumed to be specified. Determining an optimumnis discussed later, in section 5.6. The quantitygnis also known as theexpected informationaboutthat would be provided by the dataT^∗ werenobservations to be taken. It is important to bear in mind thatgnis in the subjunctive mood; i.e.gnis based on the contemplative assumption of takingnobservations and obtainingT^∗as data. It is for this reason that expectation with respect to the marginalT^∗ is taken. In Bayesian statistics, such contemplative analyses are known as preposterior analysesbecause they precede the obtaining of any actual posterior.

What are the possible choices fordT^∗ n and what possible forms can d take? The answer depends on the practicalities of the situation at hand. The simplest scenario is one wherein the decision is a tangible course of action, such as accepting or rejecting a batch of items. In this case, withdas the decision to accept anda measure of quality, it is reasonable to let d be an increasing function of. More about scenarios involving tangible courses of action is said later in sections 5.5.3 and 5.6. For now we shall consider a more subtle case wherein the decision is an enhanced appreciation of.

Choice of Utility Functions for Inference (or Enhanced Appreciation)

Since a probability density best encapsulates our appreciation of uncertainty about, our choice ofdshould be some density, sayp•. Thus dwill be of the form p•, which is the utility of declaring the densityp•, when the parameter takes the value. But it is the posterior distribution ofthat accords well with our belief about. Thus the question arises as to what functional form of p•will result in the choice that an optimum p• is the posterior distribution of ? Bernardo (1979b), in perhaps one of the most striking results in Bayesian decision theory, has shown that the utility function has to be of the logarithmic form; that is p•=logp. Consequently, if this form of the utility function is invoked in (5.47), then

gn=E_T∗

⎧⎨

⎩

logFdT^∗ nFdT^∗ n

⎫⎬

⎭−

⎧⎨

⎩

logFdFd

⎫⎬

⎭ (5.48)

In order to connect the above notions from Bayesian decision theory to those used in communication theory, we remark that theShannon informationabout, whose uncertainty is given byFd, is

logFdFd (5.49)

INFORMATION FROM LIFE-TESTS: LEARNING FROM DATA 163

−I is known as theentropy ofFd. Similarly with FdT^∗ n, we have the entities IT^∗ nand−IT^∗ n. Consequently,gnisE_T∗IT^∗ n−I

If the distribution function of T^∗ is denoted by FT^∗– recall that T^∗ has not as yet been observed – then another way of writinggnis

gn=

T^∗

log

FddT^∗ FdFdT^∗

FddT^∗ (5.50)

here F T^∗is the joint distribution function of ant T^∗. To see why, we note that the first term of (5.48) is

T^∗

logFdT^∗ n FdT^∗ nFdT^∗=

T^∗

logFdT^∗ n FddT^∗ and that the second term is

T^∗

logFdFddT^∗ Consequently,

gn=E_T∗

log

FddT^∗ FdFdT^∗

(5.51)

where the expectation is with respect toF T^∗.

When gn is written as above, it is interpreted as the mutual information between and T^∗, or as the Kullback–Leibler distance for discrimination between FdT^∗ nandFd.

Soofi (2000) provides a recent account on these and related matters. Mutual information has the property thatgn≥0, with equality, if and only ifandT^∗are independent. In the latter case, knowledge ofT^∗gives us no information about. In life-testing, one endeavors to collect that data which results in aT^∗ for whichgnis a maximum. In communication theory (Shannon (1948) and Gallager (1968)), the maximum value ofgnis known aschannel capacity; hereis interpreted as the message to be sent, andT^∗interpreted as the message received. The connection between information theory and the prior to posterior transformation in Bayesian inference has been articulated by Lindley (1956) in perhaps one of his several landmark papers. Other references relating the above ideas to the design of (life-testing) experiments are in Verdinelli, Polson and Singpurwalla (1993), and in Polson (1992).

Our discussion thus far has been conducted on the premise that thenobservations have yet to be taken and theT^∗remains to be observed; i.e. what we have been doing is a pre-posterior analysis. Thusgnrepresents the expected gain in utility or the expected information. Suppose now that the nobservations are actually taken and that the dataT^∗has indeed been obtained.

Then, by analogy with (5.47), theobserved informationor theobserved change in utilityis gT^∗ n=max

dFd T^∗ n−max

dFd

An unpalatable feature ofgT^∗ nis that unlikegn, which is always non negative,gT^∗ n could be negative. A negative value ofgT^∗ nsignals the fact that data could result in negative information. This of course, though contrary to popular belief, is still reasonable, because in actuality one could be surprised by the data one sees, and as a consequence be less sure about than one was prior to observing the data.

One attempt at avoiding the possible negativity of gT^∗ n is to introduce'gT^∗ n, the conditional value of sample information (cf. De Groot, 1984). Here, letd0denote that value of dwhich maximizes

UF d^def=

dFd

d₀is known as the Bayes’ decision with respect to the priorFd. We assume thatd₀ exists and is unique. Then

'gT^∗ n^def=max

d U FT^∗ n d−U FT^∗ n d₀

the difference between the expected utility from the Bayes’ decision using the dataT^∗ nand the expected utility usingd₀, the decision that would have been chosen had the data not been available. By definition, it is true that'gT^∗ n≥0 for everyT^∗, and that its expected value with respect toFT^∗, the marginal ofT^∗, is indeedgn.

Dalam dokumen Reliability and Risk (A Bayesian Perspective) (Halaman 180-183)