Independence - Probability and Statistics for Computer Science

Part II ProbabilityProbability

3.3 Independence

Worked example 3.13 (Shared Birthdays) You bet there is someone else in a room of 30 people who has the same birthday that you do. Assuming you know nothing about the other 29 people, what is the probability of winning?

Solution The easy way to do this is

P.fwinningg/D1 P.flosingg/:

Now you will lose if everyone has a birthday different from you. You can think of the birthdays of the others in the room as a list of 29 days of the year. If your birthday is on the list, you win; if it’s not, you lose. The number of losing lists is the number of lists of 29 days of the year such that your birthday is not in the list. This number is easy to get.

We have 364 days of the year to choose from for each of 29 locations in the list. The total number of lists is the number of lists of 29 days of the year. Each list has the same probability. So

P.flosingg/D 364²⁹ 365²⁹ and

P.fwinningg/0:0765:

There is a wide variety of problems like this; if you’re so inclined, you can make a small but quite reliable profit off people’s inability to estimate probabilities for this kind of problem correctly (Examples3.12and3.13are reliably profitable;

you could probably do quite well out of Examples3.45and3.46).

The ruleP.A B/DP.A/ P.A\B/is also occasionally useful for computing probabilities on its own; more commonly, you need other reasoning as well.

Worked example 3.14 (Dice) You flip two fair six-sided dice, and add the number of spots. What is the probability of getting a number divisible by 2, but not by 5?

Solution There is an interesting way to work the problem. WriteD_nfor the event the number is divisible byn. Now P.D₂/D1=2(count the cases; or, more elegantly, notice that each die has the same number of odd and even faces, and work from there). NowP.D₂ D₅/DP.D₂/ P.D₂\D₅/. ButD₂\D₅contains only three outcomes (6; 4,5; 5and 4; 6), soP.D₂ D₅/D18=36 3=36D5=12

Sometimes it is easier to reason about unions than to count outcomes directly.

Worked example 3.15 (Two Fair Dice) I roll two fair six-sided dice. What is the probability that the result is divisible by either 2 or 5, or both?

Solution WriteD_nfor the event the number is divisible byn. We wantP.D₂[D₅/DP.D₂/CP.D₅/ P.D₂\D₅/.

From Example3.14, we knowP.D₂/ D 1=2andP.D₂\D₅/D 3=36. By counting outcomes,P.D₅/ D 7=36. So P.D₂[D₅/D.18C7 3/=36D22=36.

62 3 Basic Ideas in Probability

A

B

A

B

Independent Events Dependent Events

Ω Ω

Fig. 3.2 On theleft,Âand^Bare independent.Âspans1=4of, andÂ\^Bspans1=4of^B. This means that knowing whether an outcome is in Aor not doesn’t affect the probability that it is in^B.1=4of the outcomes oflie inÂ, and1=4of theoutcomes in^Blie inÂ\^B. On theright, they are not. Very few of the outcomes in^Blie in^B\Â, so that observing^Bmeans thatÂbecomes less likely, because very few of the outcomes in^Balso lie inÂ\^B

interrelated in an important way. If IknowthatB has occurred, I also know thatAhas occurred—I don’t need to check separately, becauseBimpliesA.

Here is an example of a weaker interaction that results in events not being independent. WriteCfor the event that the die comes up with an odd number of spots, and writeDfor the event that the number of spots is larger than3. These events are interrelated. The probability of each event separately is 1/2. If IknowthatChas occurred, then I know that the die shows either 1, 3, or 5 spots. One of these outcomes belongs toD, and two do not. This means that knowing thatChas occurred tells you something about whetherDhas occurred. Independent events donothave this property. This means that the probability that they occur together has an important property, given in the box below.

Definition 3.3 (Independent Events) Two eventsAandBareindependentif and only if P.A\B/DP.A/P.B/

The “size” analogy helps motivate this expression. We think ofP.A/as the “size” ofArelative to, and so on. Now P.A\B/measures the “size” ofA\B—that is, the part ofAthat lies insideB. But ifAandBare independent, then the

“size” ofA\Brelative toBshould be the same as the “size” ofArelative to(Fig.3.2). Otherwise,BaffectsA, because Ais more (or less) likely whenBhas occurred.

So forAandBto be independent, we must have

“Size” ofAD “Size” of piece ofAinB

“Size” ofB ; or, equivalently,

P.A/D P.A\B/ P.B/ which yields our expression.

Worked example 3.16 (Fair Dice) The space of outcomes for a fair six-sided die is f1; 2; 3; 4; 5; 6g:

The die is fair, so each outcome has the same probability. Now we toss two fair six-sided dice. The outcome for each die is independent of that for the other. With what probability do we get two threes?

(continued)

Solution

P .first die yields 3\second die yields 3/

DP.first die yields 3/

P.second die yields 3/

D.1=6/.1=6/

D1=36

Worked example 3.17 (Find the Lady, Twice) Recall the setup of Worked example3.1. Assume that the card that is chosen is chosen fairly—that is, each card is chosen with the same probability. The game is played twice, and the cards are reshuffled between games. What is the probability of turning up a Queen and then a Queen again?

Solution The events are independent, so1=9.

You can use Definition3.3(i.e.AandBare independent if and only ifP.A\B/D P.A/P.B/) to tell whether events are independent or not. Quite small changes to a problem affect whether events are independent, as in the worked example below.

Worked example 3.18 (Cards and Independence) We shuffle a standard deck of 52 cards and draw one card. The eventAis “the card is a red suit” and the eventBis “the card is a 10”. (1): AreAandBindependent?

Now we take a standard deck of cards, and remove the ten of hearts. We shuffle this deck, and draw one card. The eventCis “the card drawn from the modified deck is a red suit” and the eventDis “the card drawn from the modified deck is a 10”. (2): AreCandDindependent?

Solution (1):P.A/D1=2,P.B/D1=13and in Example3.44we determinedP.A\B/D2=52. But2=52D1=26D P.A/P.B/, so they are independent.

(2): These are not independent becauseP.C/ D 25=51,P.D/ D 3=51andP.C\D/ D 1=51 ¤ P.C/P.D/ D 75=.51²/

The probability of a sequence of independent events can become very small very quickly, and this often misleads people.

Worked example 3.19 (Accidental DNA Matches) I search a DNA database with a sample. Each time I attempt to match this sample to an entry in the database, there is a probability of an accidental chance match of1e 4. Chance matches are independent. There are 20,000 people in the database. What is the probability I get at least one match, purely by chance?

Solution This is1 P(no chance matches). ButP.no chance matches/is much smaller than you think. We have P.no chance matches/

DP 0 B B

no chance match to record 1\ no chance match to record 2\

: : :\

no chance match to record 20,000 1 C C A

(continued)

64 3 Basic Ideas in Probability

DP.no chance match to a record/^20;000 D.1 1e 4/^20;000

0:14

so the probability is about0:86that you get at least one match by chance. If you’re surprised, look at the exponent.

Notice that if the database gets bigger, the probability grows; so at 40,000 the probability of one match by chance is0:98.

People quite often reason poorly about independent events. The most common problem is known as thegambler’s fallacy.

This occurs when you reason that the probability of an independent event has been changed by previous outcomes. For example, imagine I toss a coin that is known to be fair 20 times and get 20 heads. The probability that the next toss will result in a head has not changed at all—it is still 0.5—but many people will believe that it has changed. At time of writing, Wikipedia has some fascinating stories about the gambler’s fallacy which suggest that it’s quite a common mistake. People may interpret, say, a run of 20 heads as evidence that either the coin isn’t fair, or the tosses aren’t independent.

Remember this:Independence can mislead your intuition. There are two common problems. The first happens because the probability of a set of independent events can become very small very quickly, so that modelling events that aren’t independent as independent can lead to trouble (as in Worked example3.19). The second happens because most people want to believe that the universe keeps track of independent events to ensure that probability calculations work (the gambler’s fallacy).

3.3.1 Example: Airline Overbooking

We can now quite easily study airline overbooking. Airlines generally sell more tickets for a flight than there are seats on the aircraft, because some passengers don’t turn up on time, usually for random reasons. If the airline only sold one ticket per seat, their planes would likely have empty seats—which are lost profit—on each flight. If too many passengers turn up for a flight, the airline hopes that someone will accept a reasonable sum of money to take the next flight. Overbooking is sensible, efficient behavior and good for passengersifsensibly administered by the airline. This is because ticket prices should be at their lowest when each plane is just full, and there is quite likely some passenger who will take money to fly at some other time.

To choose the number of extra tickets sold, the airline needs to think about the probability of having to pay out (which we compute below) and the amount of money they will need to pay. We don’t have the tools to discuss how much the airline may need to pay, which depends quite a lot on passenger behavior, details of the schedule for the next flight, and so on. On occasion, the strategy can get expensive for the airline. While I was revising this text for publication, an airline managed to hit headlines by having airport security drag a passenger off a flight. Details of the resulting settlement were not publicised, but it can’t have been cheap for the airline.

Worked example 3.20 (Overbooking—1) An airline has a regular flight with six seats. It always sells seven tickets.

Passengers turn up for the flight with probabilityp, and do so independent of other passengers. What is the probability that the flight is overbooked?

Solution This is like a coin-flip problem; think of each passenger as a biased coin. With probabilityp, the biased coin comes upT (forturn up) and with probability.1 p/, it turns upH(for no-show). This coin is flipped seven times, and we are interested in the probability that there are sevenT’s. This isp⁷, because the flips are independent.

Worked example 3.21 (Overbooking—2) An airline has a regular flight with six seats. It always sells eight tickets.

Passengers turn up for the flight with probabilityp, and do so independent of other passengers. What is the probability that the flight is overbooked?

Solution Now we flip the coin eight times, and are interested in the probability of getting more than sixT’s. This is the union of two disjoint events (sevenT’s and eightT’s). For the case of sevenT’s, one flip must beH; there are eight choices for this flip. For the case of eightT’s, all eight flips must beT, and there is only one way to achieve this. So the probability the flight is overbooked is

P.overbooked/DP.7T’s[8T’s/

DP.7T’s/CP.8T’s/

D8p⁷.1 p/Cp⁸

Worked example 3.22 (Overbooking—3) An airline has a regular flight with six seats. It always sells eight tickets.

Passengers turn up for the flight with probabilityp, and do so independent of other passengers. What is the probability that six passengers arrive? (i.e. the flight is not overbooked or underbooked).

Solution Now we flip the coin eight times, and are interested in the probability of getting exactly six T’s. The probability that a particular set of six passengers arrives is given by the probability of getting any given string of sixT’s and twoH’s. This must have probabilityp⁶.1 p/². But there are a total of _2Š6Š^8Š distinct such strings. So the probability that six passengers arrive is

8Š

2Š6Šp⁶.1 p/² D28p⁶.1 p/²:

Worked example 3.23 (Overbooking—4) An airline has a regular flight with s seats. It always sells t tickets.

Passengers turn up for the flight with probabilityp, and do so independent of other passengers. What is the probability thatupassengers turn up?

Solution Now we flip the cointtimes, and are interested in the probability of gettingu T’s. There are tŠ

uŠ.t u/Š

disjoint outcomes withu T’s andt u H’s. Each such outcome is independent, and has probabilityp^u.1 p/^{t u}. So P.upassengers turn up/D tŠ

uŠ.t u/Šp^u.1 p/^{t u}

66 3 Basic Ideas in Probability

Worked example 3.24 (Overbooking—5) An airline has a regular flight with s seats. It always sells t tickets.

Passengers turn up for the flight with probabilityp, and do so independent of other passengers. What is the probability that the flight is oversold?

Solution We needP.fsC1turn upg [ fsC2turn upg [: : :[ ftturn upg/. But the eventsfiturn upgandfjturn upg are disjoint ifi¤j. So we can exploit Example3.23, and write

P.overbooked/DP.fsC1turn upg/ CP.fsC2turn upg/C : : :P.ftturn upg/

iDsC1

P.fiturn upg/

iDsC1

tŠ

iŠ.t i/Špⁱ.1 p/^{t i}

Dalam dokumen Probability and Statistics for Computer Science (Halaman 78-83)