B IVARIATE P ROBABILITIES - Statistical Independence

Statistical Independence

3.4 B IVARIATE P ROBABILITIES

In this section we introduce a class of problems that involve two distinct sets of events, which we label A₁, A₂, c, A_H and B₁, B₂, c, B_K. These problems have broad applica-tion in business and economics. They can be studied by constructing two-way tables that develop intuition for problem solutions. The events A_i and B_j are mutually exclusive and collectively exhaustive within their sets, but intersections 1A_i> Bj2 can occur between all events from the two sets. These intersections can be regarded as basic outcomes of a random experiment. Two sets of events, considered jointly in this way, are called bivariate, and the probabilities are called bivariate probabilities. It is possible to apply the methods of this section to trivariate and higher-level probabilities, but with added complexity.

We also consider situations where it is difficult to obtain desired conditional prob-abilities, but where alternative conditional probabilities are available. It may be difficult to obtain probabilities because the costs of enumeration are high or because some critical, ethical, or legal restriction prevents direct collection of probabilities.

Table 3.6 illustrates the outcomes of bivariate events labeled A₁, A₂, c, A_H and B₁, B₂, c, B_K. If probabilities can be attached to all intersections 1A_i> Bj2, then the whole probability structure of the random experiment is known, and other probabilities of inter-est can be calculated.

Table 3.6 Outcomes for Bivariate Events

B1 B2 c BK

A₁ P1A₁> B12 P1A₁> B22 c P1A₁> BK2 A₂ P1A₂> B12 P1A₂> B22 c P1A₂> BK2

. . . . .

A_H P1A_H> B12 P1A_H> B22 c P1A_H> BK2

As a discussion example, consider a potential advertiser who wants to know both income and other relevant characteristics of the audience for a particular television show.

Families may be categorized, using A_i, as to whether they regularly, occasionally, or never

3.4 Bivariate Probabilities 123 watch a particular series. In addition, they can be categorized, using B_j, according to low-, middle-, and high-income subgroups. Then the nine possible cross-classifications can be set out in the form of Table 3.7, with H = 3 and K = 3. The subsetting of the pop-ulation can also be displayed using a tree diagram, as shown in Figure 3.8. Beginning at the left, we have the entire population of families. This population is separated into three branches, depending on their television-viewing frequency. In turn, each of these branches is separated into three subbranches, according to the family income level. As a result, there are nine subbranches corresponding to all combinations of viewing frequency and income level.

Table 3.7 Probabilities for Television Viewing and Income Example

Viewing Frequency High Income Middle Income Low Income Totals

Regular 0.04 0.13 0.04 0.21

Occasional 0.10 0.11 0.06 0.27

Never 0.13 0.17 0.22 0.52

Totals 0.27 0.41 0.32 1.00

Figure 3.8 Tree Diagram for Television Viewing and Income Example

High income Middle income Low income

High income Middle income Low income Regularly watch series

Occasionally watch series Never watch series

Whole population

Joint and Marginal Probabilities

In the context of bivariate probabilities the intersection probabilities,

P1A_i> Bj2, are called joint probabilities. The probabilities for individual events, P1A_i2 or P1B_j2, are called marginal probabilities. Marginal probabilities are at the margin of a table such as Table 3.7 and can be computed by summing the corresponding row or column.

Now it is necessary to obtain the probabilities for each of the event intersections. These probabilities, as obtained from viewer surveys, are all presented in Table 3.7. For example, 10% of the families have high incomes and occasionally watch series. These probabili-ties are developed using the relative frequency concept of probability, assuming that the survey is sufficiently large so that proportions can be approximated as probabilities. On this basis, the probability that a family chosen at random from the population has a high income and occasionally watches the show is 0.10.

124 Chapter 3 Elements of Chance: Probability Methods

To obtain the marginal probabilities for an event, we merely sum the corresponding mutually exclusive joint probabilities:

P1A_i2 = P1Ai> B12 + P1Ai> B22 + g + P1Ai> BK2

Note that this would be equivalent to summing the probabilities for a particular row in Table 3.7. An analogous argument shows that the probabilities for B_j are the column totals.

Continuing with the example, define the television-watching subgroups as A₁, “reg-ular”; A₂, “occasional”; and A₃, “never.” Similarly define the income subgroups as B₁,

“high”; B₂, “middle”; and B₃, “low.” Then the probability that a family is an occasional viewer is as follows:

P1A₂2 = P1A2> B12 + P1A2> B22 + P1A2> B32 = 0.10 + 0.11 + 0.06 = 0.27

Similarly, we can add the other rows in Table 3.7 to obtain P1A₁2 = 0.21 and P1A32 = 0.52.

We can also add the columns in Table 3.7 to obtain

P1B₁2 = 0.27 P1B22 = 0.41 and P1B32 = 0.32

Marginal probabilities can also be obtained from tree diagrams like Figure 3.9, which has the same branches as Figure 3.8. The right-hand side contains all of the joint probabilities, and the marginal probabilities for the three viewing-frequency events are entered on the main branches by adding the probabilities on the corresponding sub-branches. The tree-branch model is particularly useful when there are more than two events of interest. In this case, for example, the advertiser might also be interested in the age of the head of household or the number of children. The marginal probabilities for the various events sum to 1 because those events are mutually exclusive and mutually exhaustive.

Figure 3.9 Tree Diagram for the Television Viewing–

Income Example, Showing Joint and Marginal Probabilities

P(A₁>B1) = .04

P(A₂) = .27 P(A¹) = .2

P(A3) = . 52

P(A₁>B2) = .13

P(A₁>B3) = .04

P(A₃>B1) = .13

P(A₃>B2) = .17

P(A₃>B3) = .22 P(A₂>B1) = .10

P(S) = 1 P(A₂>B2) = .11

P(A₂>B3) = .06

A₁: Regularly watch A₂: Occasionally watch A₃: Never watch B₁: High income B₂: Middle income B₃: Low income S : Sample space

In many applications we find that the conditional probabilities are of more interest than the marginal probabilities. An advertiser may be more concerned about the prob-ability that a high-income family is watching than the probprob-ability of any family watching.

The conditional probability can be obtained easily from the table because we have all the joint probabilities and the marginal probabilities. For example, the probability of a high-income family regularly watching the show is as follows:

P1A₁u B12 = P1A₁> B12 P1B₁2 = 0.04

0.27 = 0.15

3.4 Bivariate Probabilities 125 Table 3.8 shows the probability of the viewer groups conditional on income levels.

Note that the conditional probabilities with respect to a particular income group always add up to 1, as seen for the three columns in Table 3.8. This will always be the case, as seen by the following:

i=1P1Aiu Bj2 = a^H

i=1

P1Ai> Bj2

P1B_j2 = P1Bj2 P1B_j2 = 1

The conditional probabilities for the income groups, given viewing frequencies, can also be computed, as shown in Table 3.9, using the definition for conditional probability and the joint and marginal probabilities.

To obtain the conditional probabilities of income given viewing frequency in Table 3.7, we divide each of the joint probabilities in a row by the marginal probability in the right-hand column. For example,

P1low income u occasional viewer2 = 0.06 0.27 = 0.22

Table 3.8 Conditional Probabilities of Viewing Frequencies, Given Income Levels Viewing Frequency High Income Middle Income Low Income

Regular 0.15 0.32 0.12

Occasional 0.37 0.27 0.19

Never 0.48 0.41 0.69

Table 3.9 Conditional Probabilities of Income Levels, Given Viewing Frequencies Viewing Frequency High Income Middle Income Low Income

Regular 0.19 0.62 0.19

Occasional 0.37 0.41 0.22

Never 0.25 0.33 0.42

We can also check, by using a two-way table, whether or not paired events are statis-tically independent. Recall that events A_i and B_j are independent if and only if their joint probability is the product of their marginal probabilities—that is, if

P1A_i> Bj2 = P1Ai2P1B_j2

In Table 3.7 joint events A₂ (“occasionally watch”) and B₁ (“high income”) have a prob-ability of

P1A₂> B12 = 0.10 and

P1A₂2 = 0.27 P1B12 = 0.27

The product of these marginal probabilities is 0.0729 and, thus, not equal to the joint prob-ability of 0.10. Hence, events A₂ and B₁ are not statistically independent.

Independent Events

Let A and B be a pair of events, each broken into mutually exclusive and col-lectively exhaustive event categories denoted by labels A₁, A₂, . . . , A_H and B₁, B₂, . . . , B_K. If every event A_i is statistically independent of every event B_j, then A and B are independent events.

126 Chapter 3 Elements of Chance: Probability Methods

Since A₂ and B₁ are not statistically independent, it follows that the events “viewing frequency” and “income” are not independent.

In many practical applications the joint probabilities will not be known precisely. A sample from a population is obtained, and estimates of the joint probabilities are made from the sample data. We want to know, based on this sample evidence, if these events are independent of one another. We will develop a procedure for conducting such a test later in the book.

Dalam dokumen Statistics for Business and Economics (Halaman 123-127)