For a discrete random variable X, E(X) was obtained by summing over possi- ble Xvalues. Here we replace summation by integration and the pmf by the pdf to get a continuous weighted average.
x
#
p(x)
The expectedor mean valueof a continuous rvXwith pdf f(x) is mX5E(X)5 3
`
2`
x
#
f(x) dx
The pdf of weekly gravel sales Xwas
f(x)5
u
32(12 x2) 0#x#10 otherwise
1.5
0 1 x
f (x)
1
0 1 x
F(x)
.5
.347
Figure 4.11 The pdf and cdf for Example 4.9 ■
Example 4.11
PROPOSITION If Xis a continuous rv with pdf f(x) and h(X) is any function of X, then E[h(X)]5mh(X)5 3
`
2`
h(x)
#
f(x) dx so
■
When the pdf f(x) specifies a model for the distribution of values in a numeri- cal population, then is the population mean, which is the most frequently used measure of population location or center.
Often we wish to compute the expected value of some function h(X) of the rv X. If we think of h(X) as a new rv Y, techniques from mathematical statistics can be used to derive the pdf of Y, and E(Y) can then be computed from the definition.
Fortunately, as in the discrete case, there is an easier way to compute E[h(X)].
m 5 3
23
1
0
(x2x3) dx5 3 2ax2
2 2 x4 4 b`
x50 x51
5 3 8 E(X)5 3
`
2`
x
#
f(x) dx5 3
1
0
x
#
32 (1 2x2) dx
Two species are competing in a region for control of a limited amount of a certain resource. Let the proportion of the resource controlled by species 1 and suppose Xhas pdf
which is a uniform distribution on [0, 1]. (In her book Ecological Diversity,E. C.
Pielou calls this the “broken-stick” model for resource allocation, since it is analo- gous to breaking a stick at a randomly chosen point.) Then the species that controls the majority of this resource controls the amount
The expected amount controlled by the species having majority control is then
■
For h(X), a linear function, .
In the discrete case, the variance of Xwas defined as the expected squared devia- tion from and was calculated by summation. Here again integration replaces summation.
m
E[h(X)]5E(aX1b)5aE(X)1 b 5 3
1/2
0
(12 x)
#
1 dx1 3
1
1/2
x
#
1 dx5 3 4 E[h(X)]5 3
`
2`
max(x, 12 x)
#
f(x) dx 5 3
1
0
max(x, 12 x)
#
1 dx h(X)5 max (X, 12X)5 μ
12X if 0#X, 1 2 X if 1
2 #X#1 f(x)5 e1 0#x#1
0 otherwise X5
Example 4.12
(Example 4.10 continued)
PROPOSITION
DEFINITION The varianceof a continuous random variable Xwith pdf f(x) and mean value is
The standard deviation(SD) of Xis .sX5 2V(X) sX25V(X)5 3
`
2`
(x2m)2
#
f(x)dx5E[(X2m)2] m
V(X)5E(X2)2 [E(X)]2
The variance and standard deviation give quantitative measures of how much spread there is in the distribution or population of xvalues. Again is roughly the size of a typical deviation from . Computation of is facilitated by using the same short- cut formula employed in the discrete case.
s2 m
s
EXERCISES Section 4.2 (11–27)
11. Let Xdenote the amount of time a book on two-hour reserve is actually checked out, and suppose the cdf is
Use the cdf to obtain the following:
a.
b.
c.
d. The median checkout duration [solve e. to obtain the density function f(x) f. E(X)
g. V(X) and
h. If the borrower is charged an amount when checkout duration is X, compute the expected charge E[h(X)].
h(X)5X2 sX
Fr(x)
.55F(m|)]
m| P(X.1.5)
P(.5#X#1) P(X#1)
F(x)5 d
0 x,0 x2
4 0#x,2 1 2#x
12. The cdf for X( measurement error) of Exercise 3 is
a. Compute .
b. Compute .
c. Compute .
d. Verify that f(x) is as given in Exercise 3 by obtaining .
e. Verify that .
13. Example 4.5 introduced the concept of time headway in traffic flow and proposed a particular distribution for the headway between two randomly selected consecutive cars (sec). Suppose that in a different traffic environment, the distribution of time headway has the form
X5 m|50
Fr(x)
P(.5,X) P(21,X,1) P(X,0) F(x)5 d
0 x, 22
1 2 1 3
32a4x2 x3
3b 22#x,2
1 2#x
5
For weekly gravel sales, we computed . Since
■ When , the expected value and variance of h(X) satisfy the same properties as in the discrete case: E[h(X)]5 am1 band V[h(X)]5 a2
#
s2.h(X)5aX1 b V(X)5 1
5 2 a3
8b25 19
320 5.059 and sX5 .244 5 3
1
0
3
2 (x22x4) dx5 1 5 E(X2)5 3
`
2`
x2
#
f(x) dx5 3
1
0
x2
#
32 (12 x2) dx E(X)5 38
X5
a. Determine the value of kfor which f(x) is a legitimate pdf.
b. Obtain the cumulative distribution function.
c. Use the cdf from (b) to determine the probability that headway exceeds 2 sec and also the probability that headway is between 2 and 3 sec.
d. Obtain the mean value of headway and the standard deviation of headway.
e. What is the probability that headway is within 1 standard deviation of the mean value?
14. The article “Modeling Sediment and Water Column Interactions for Hydrophobic Pollutants” (Water Research, 1984: 1169–1174) suggests the uniform distribution on the interval (7.5, 20) as a model for depth (cm) of the bioturba- tion layer in sediment in a certain region.
a. What are the mean and variance of depth?
b. What is the cdf of depth?
c. What is the probability that observed depth is at most 10? Between 10 and 15?
d. What is the probability that the observed depth is within 1 standard deviation of the mean value? Within 2 stan- dard deviations?
15. Let X denote the amount of space occupied by an article placed in a 1- packing container. The pdf of Xis
a. Graph the pdf. Then obtain the cdf of Xand graph it.
b. What is [i.e., F(.5)]?
c. Using the cdf from (a), what is ? What is ?
d. What is the 75th percentile of the distribution?
e. Compute E(X) and .
f. What is the probability that X is more than 1 standard deviation from its mean value?
16. Answer parts (a)–(f ) of Exercise 15 with lecture time past the hour given in Exercise 5.
17. Let Xhave a uniform distribution on the interval [A, B].
a. Obtain an expression for the (100p)th percentile.
b. Compute E(X), V(X), and .
c. For n, a positive integer, compute .
18. Let Xdenote the voltage at the output of a microphone, and suppose that Xhas a uniform distribution on the interval from to 1. The voltage is processed by a “hard limiter”
with cutoff values and .5, so the limiter output is a ran- dom variable Yrelated to Xby if if
, and if .
a. What is ?
b. Obtain the cumulative distribution function of Y and graph it.
P(Y5.5)
X, 2.5 Y5 2.5
X..5
|X|#.5, Y5.5 Y5X
2.5 21
E(Xn) sX
X5 sX
P(.25#X#.5)
P(.25,X#.5) P(X#.5)
f(x)5 e90x8(12x) 0,x,1
0 otherwise
ft3
f(x)5 • k x4 x.1
0 x#1
19. Let Xbe a continuous rv with cdf
[This type of cdf is suggested in the article “Variability in Measured Bedload-Transport Rates” (Water Resources Bull.,1985: 39–48) as a model for a certain hydrologic vari- able.] What is
a. ?
b. ?
c. The pdf of X?
20. Consider the pdf for total waiting time Yfor two buses
introduced in Exercise 8.
a. Compute and sketch the cdf of Y. [Hint: Consider sepa-
rately and in computing F(y). A
graph of the pdf should be helpful.]
b. Obtain an expression for the (100p)th percentile. [Hint:
Consider separately and .]
c. Compute E(Y) and V(Y). How do these compare with the expected waiting time and variance for a single bus when the time is uniformly distributed on [0, 5]?
21. An ecologist wishes to mark off a circular sampling region having radius 10 m. However, the radius of the resulting region is actually a random variable Rwith pdf
What is the expected area of the resulting circular region?
22. The weekly demand for propane gas (in 1000s of gallons) from a particular facility is an rv Xwith pdf
a. Compute the cdf of X.
b. Obtain an expression for the (100p)th percentile. What is the value of ?
c. Compute E(X) and V(X).
d. If 1.5 thousand gallons are in stock at the beginning of the week and no new supply is due in during the week, how much of the 1.5 thousand gallons is expected to be left at the end of the week? [Hint: Let amount left when demand5x.]
h(x)5 m|
f(x)5 u2a12x12b 1#x#2
0 otherwise
f(r)5 u 34 [12(102r)2] 9#r#11
0 otherwise
.5,p,1 0,p,.5
5#y#10 0#y,5
f(y)5 e 1
25 y 0#y,5 2
52 1
25 y 5#y#10
0 otherwise
P(1#X#3) P(X#1)
F(x)5 μ
0 x#0 x
4c11 lna4
xbd 0,x#4 1 x.4
DEFINITION
23. If the temperature at which a certain compound melts is a random variable with mean value and standard devi- ation , what are the mean temperature and standard deviation measured in ? [Hint: .]
24. Let Xhave the Pareto pdf
introduced in Exercise 10.
a. If , compute E(X).
b. What can you say about E(X) if ?
c. If , show that .
d. If , what can you say about V(X)?
e. What conditions on kare necessary to ensure that is finite?
25. Let Xbe the temperature in at which a certain chemical reaction takes place, and let Ybe the temperature in (so
).
a. If the median of the X distribution is , show that is the median of the Ydistribution.
b. How is the 90th percentile of the Ydistribution related to the 90th percentile of the X distribution? Verify your conjecture.
c. More generally, if , how is any particular percentile of the Ydistribution related to the correspon- ding percentile of the Xdistribution?
26. Let X be the total medical expenses (in 1000s of dollars) incurred by a particular individual during a given year.
Y5aX1b 1.8m|132
m| Y51.8X132
8F 8C
E(Xn) k52
V(X)5ku2(k21)22 (k22)21 k.2
k51 k.1
f(x; k, u)5 u k
#
ukxk11 x$u 0 x,u
8F51.88C132 8F
28C
1208C
Although Xis a discrete random variable, suppose its distri- bution is quite well approximated by a continuous distribu-
tion with pdf for .
a. What is the value of k?
b. Graph the pdf of X.
c. What are the expected value and standard deviation of total medical expenses?
d. This individual is covered by an insurance plan that entails a $500 deductible provision (so the first $500 worth of expenses are paid by the individual). Then the plan will pay 80% of any additional expenses exceed- ing $500, and the maximum payment by the individual (including the deductible amount) is $2500. Let Y denote the amount of this individual’s medical expenses paid by the insurance company. What is the expected value of Y?
[Hint: First figure out what value of Xcorresponds to the maximum out-of-pocket expense of $2500. Then write an expression for Y as a function of X (which involves several different pieces) and calculate the expected value of this function.]
27. When a dart is thrown at a circular target, consider the loca- tion of the landing point relative to the bull’s eye. Let Xbe the angle in degrees measured from the horizontal, and assume that Xis uniformly distributed on [0, 360]. Define Yto be the
transformed variable , so Yis
the angle measured in radians and Yis between and . Obtain E(Y) and by first obtaining E(X) and , and then using the fact that h(X) is a linear function of X.
sX sY
p 2p Y5h(X)5(2p/360)X2p
x$0 f(x)5k(11x/2.5)27
4.3 The Normal Distribution
The normal distribution is the most important one in all of probability and statistics.
Many numerical populations have distributions that can be fit very closely by an appropriate normal curve. Examples include heights, weights, and other physical characteristics (the famous 1903 Biometrikaarticle “On the Laws of Inheritance in Man” discussed many examples of this sort), measurement errors in scientific exper- iments, anthropometric measurements on fossils, reaction times in psychological experiments, measurements of intelligence and aptitude, scores on various tests, and numerous economic measures and indicators. In addition, even when individual vari- ables themselves are not normally distributed, sums and averages of the variables will under suitable conditions have approximately a normal distribution; this is the content of the Central Limit Theorem discussed in the next chapter.
A continuous rv Xis said to have a normal distributionwith parameters and (or and ), where and , if the pdf of Xis
(4.3) f(x; m, s)5 1
12ps e2(x2m)2/(2s2) 2` , x, ` 0, s
2` ,m, ` s2
m s
m
Again edenotes the base of the natural logarithm system and equals approximately 2.71828, and represents the familiar mathematical constant with approximate value 3.14159. The statement that Xis normally distributed with parameters and
is often abbreviated .
Clearly f(x; m, s)$0, but a somewhat complicated calculus argument must be X|N(m, s2)
s2
m p
0.09
f(x)
0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00
40 60 80
(a) (b)
x
100 120
= 80, = 15
= 100, = 5
Figure 4.13 (a) Two different normal density curves (b) Visualizing and for a normal distribution
s m
used to verify that
冕
. It can be shown that and, so the parameters are the mean and the standard deviation of X. Figure 4.13 presents graphs of for several different pairs. Each density curve is symmetric about and bell-shaped, so the center of the bell (point of symmetry) is both the mean of the distribution and the median. The value of is the distance from to the inflection points of the curve (the points at which the curve changes from turning downward to turning upward). Large values of yield graphs that are quite spread out about , whereas small values of yield graphs with a high peak above and most of the area under the graph quite close to . Thus a large implies that a value of Xfar from may well be observed, whereas such a value is quite unlikely when is small.m s
s m
m s
m
s
m s
m
(m, s) f(x; m, s)
V(X)5s2
E(X)5 m
2`` f(x; m, s) dx51