• Tidak ada hasil yang ditemukan

Sampling from the Exponential and Poisson Distributions

7.7 Waiting Times

7.7.3 Sampling from the Exponential and Poisson Distributions

We would like some way to generate random numbers from the exponential or Poisson distributions. The Statistics and Machine Learning toolkit has built-in functions to do this, as does statistical software such as R. If we want to use MATLAB and do not have access to the toolkit, it is easy enough to write our own code to do this.

It turns out that if we are able compute the inverse cumulative distribution function,F−1(x), then we may easily sample from the distribution by composing the uniform distribution withF−1. This is stated in the following proposition.

Proposition 7.7.5. If F(y) is a given cumulative distribution function that is strictly increasing when 0 < F(y)<1 and if U is a random variable with uniform distribution on [0,1], then Y =F−1(U) has the cumulative distribution functionF(y).

Recall that the exponential distribution has the probability density functionf(x) =λe−λx, and the cumu- lative distribution function F(x) = 1−e−λx. To help understand how Proposition 7.7.5 works, imagine a point (x, y) on the graph of F(x) =P(X ≤x) = y, shown in Figure 7.3. For this point (x, y), we have x=F−1(y). To findF−1, we set F(x) =yand solve forx,

y= 1−e−λx=⇒e−λx= 1−y=⇒ −λx= ln(1−y) =⇒x=−1

λln(1−y).

To find a random value x from this distribution, we simply take a random number y ∈ [0,1], then the value x = F−1(y) will be an appropriate value from the distribution and who’s cumulative probability P(X≤x) =y.

0 1 2 3 4 5

0 0.2 0.4 0.6 0.8 1

(x,y)

F(x) = 1-e-λ x

Figure 7.3: Cumulative distribution function,F(x) = 1−e−λx, λ= 1

For example, suppose thatλ= 1 and we randomly select 0.7 from the set [0,1]. Then we want to find the x such thatP(X≤x) = 0.7.From above,x=−11ln(1−0.7)≈1.20. Hence,P(X ≤1.20)≈0.7, as desired.

To summarize, ifY ∼U(0,1), then

X =−1

λln(1−Y) (7.17)

is exponentially distributed with probability density functionf(x) =λe−λx.

We may apply this trick to simulate the selection of any continuous random variable in which the cumulative distribution functionF is invertible.

Example 7.7.6

Suppose the time,T, that a light bulb lasts before burning out is exponentially distributed with a mean of 100 hours. Write a MATLAB code to simulate randomly selecting light bulbs and seeing how long they will last. In this case,λ= 1/100 and the probability density function isf(t) =λe−0.01tfort≥0.

To generate 200 light bulb durations from the exponential distribution and create a histogram, we could use the MATLAB code

n = 200; %number of simulated values

duration = zeros(n,1); %initialize vector of random values lambda = .01; %Events (number of lightbulbs) per hour for i=1:n

duration(i) = -log(1-rand)/lambda %draw random value with inverse cdf end

hist(duration)

Previously, we have been using frequency histograms to represent data. Suppose now we want to make a probability histogram, meaning that the height of the bar corresponds with the proportion of elements lying in the interval. We first use [x y]=hist(z,n) to create a histogram object. The variables x and y

0 100 200 300 400 500 600 700 Duration (hours)

0 0.002 0.004 0.006 0.008 0.01

Figure 7.4: Density histogram of light bulb longevity with density function

will ben×1 vectors; with x containing the bin counts and y containing the midpoints of the bins. To convert the counts to proportions, we divide by the the sum of all counts, by using x/sum(x). Finally, we use bar(y,x/sum(x),1) to create a bar graph where y gives the centers of each bar, and x/sum(x) give the height of the bars. Setting the third parameter to 1 ensures that the bars will be adjacent, instead of having a gap. For our example, we should add the following lines to the end of our code to generate a probability histogram with 10 bins.

[x y]=hist(duration, 10);

bar(y,x/sum(x),1);

To compare a histogram to a probability density function, we need to scale the bin heights so that the area of the rectangles sum to 1. The result is called adensity histogram. To do this in MATLAB, we simply need to divide by the bin width. We could use the line binwidth = y(2)-y(1) to compute the bin width, then plot the histogram using bar(y,x/sum(x)/binwidth, 1).

We may want to plot the density function atop the probability histogram to make sure that the code is doing what it is supposed to be doing.

For our example we would add the following code to the end of the for loop - this will generate a plot like the one shown in Figure 7.4.

[x y] = hist(duration, 10);

binwidth=y(2)-y(1);

bar(y, x/sum(x)/binwidth,1) hold on

T = linspace(0, max(y), 100);

plot(T, lambda*exp(-T*lambda),’k’)

We may want to use our simulation to estimate a probability. For example if we want to know what percentage of bulbs lasted 100 hours or more, we could add the line

mean(duration>100)

This should be close to the theoretical value obtained from the density function, P(T > 100) = e−0.01·100≈0.368.

Next, suppose we want to take a random sample from the Poisson distribution. We want to find the number of events that occur in some unit of time, assuming there an average of λevents during this unit of time.

We will simply use (7.17) to generate a random waiting time from the exponential distribution, then repeat until the randomly generated times add up to one unit. The Poisson random variable is then the number of events that occurred during this unit of time.

To generate a Poisson(λ) random variable numEvents := 0, time := 0

while time<1

Choose a random numbery∈[0,1]

time := time -(1/λ)·ln(1−y) numEvents := numEvents + 1 end

return num Events - 1

Example 7.7.7

Continuing from Example 7.7.6, suppose a light bulb has a mean longevity of 100 hours, and we want to know how many light bulbs we will go through during a 1000 hour period of time (assume that as they die out they are instantaneously replaced). The number of bulbs used, X is Poisson with rate λ= 1000/100 = 10 (on average we expect to use 10 bulbs during the 1000 hours). To get a sense of the distribution, we would take the code above and put it inside a simulation loop. We could use the following MATLAB code to simulate this 200 times:

n=200; %number of simulations

numEvents = zeros(n,1); %initialize vector of lightbulbs

lambda = 10; %lightbulbs per 1000 hr time unit

for i=1:n

time = 0; %initialize wait time

while time<1 %wait time is less than one 1000-hr time unit randExp = - (1/lambda)*log(1-rand); %draw random value for next event

time = time + randExp; %update total wait time numEvents(i) = numEvents(i) + 1; %update number of events end

numEvents(i)=numEvents(i) - 1; %correct for last completion of while loop end

hist(numEvents)

The resulting histogram should be relatively symmetric and centered at the mean of 10.

Refer to Activity A.0.64 to practice sampling from the Poisson and exponential distributions.