Topic 3: The Expectation of a Random Variable

(1)

Topic 3: The Expectation of a Random Variable Rohini Somanathan

Course 003, 2016

(2)

Definition: The expected value of a discrete random variable exists, and is defined by EX= P

x∈R(X)xf(x)

• The expectation is simply a weighted average of possible outcomes, with the weights being assigned by f(x).

• In general EX6∈ R(X). Consider the experiment of rolling a die where the random variable is the number of dots on the die. The density function is given by f(x) = ¹₆I_{_1,2...6_}(x)

The expectation is given by P6 x=1

x

6I_{_1,2...6_}(x) = 3.5

• We can think of the expectation as a point of balance: if weights are placed on a weightless rod, where should a fulcrum be placed so that the rod balances?

• If X can take only a finite number of different values, this expectation always exists

• With an infinite sequence of possible values, it may or may not exist- high values would

(3)

Expectations: some useful results

• Result 1: If X and Y are r.v.’s with the same distribution, then E(X) = E(Y)

(if either side exists)

The proof follows immediately from the definition of E(X). Note that the converse of the above statement is NOT true. This is a common mistake.

• Result 2: For any r.v. X and a constant c

E(cX) = cE(X)

• Result 3 (linearity of expectation): For any r.v.s X, Y, E(X+Y) = E(X) +E(Y) Note: We do not require independence of these two r.v.s.

(4)

The expectations of Bernoulli and Binomial r.v.s

Bernoulli: Using the definition of the expectation, this is 1p+0q =p

Binomial: A binomial r.v. can be expressed as the sum of n Bernoulli r.v.s:

X =I₁+I₂ +. . .I_n.

By linearity,

E(X) = E(I₁) +E(I₂) +. . .E(I_n) = np.

(5)

The Geometric distribution

Definition (Geometric distribution): Consider a sequence of independent Bernoulli trials, all with success probability p ∈(0, 1). Let X be the number of failures before the first successful trial. Then X∼ Geom(p)

The PMF is given by

P(X =k) = q^kp

for k =0, 1, 2, . . . where q = 1−p. This is a valid PMF since the sum p P^∞

k=0

q^k = _1−q^p =1

The expectation of a geometric r.v. is defined as E(X) =

X∞ k=0

kq^kp

We know that P^∞

k=0

q^k = _1−q¹ . Differentiating both sides w.r.t. q, we get P^∞

k=0

kq^k−1 = ¹

(1−q)². Now multiplying both sides by pq, we get

E(X) = pq 1

(1−q)² = q p

(6)

The Negative Binomial distribution

Definition (Negative Binomial distribution): Consider a sequence of independent Bernoulli trials, all with success probability p ∈ (0, 1). Let X be the number of failures before the rth success. Then X ∼ NBin(r,p)

The PMF is given by

P(X =n) =

n+r−1 r−1

p^rqⁿ for k =0, 1, 2, . . . where q = 1−p.

We can write the Negative Binomial as a sum of r Geom(p) r.v.s: X =X₁ +X₂· · ·+X_r The expectation of a Negative Binomial r.v. is therefore

E(X) = rq p

(7)

Expectation of a function of X

Result: If X is a discrete random variable and g is a function from R to R, then E(g(X)) = X

x∈R(X)

g(x)f(x)

Intuition: Whenever X =x, g(X) = g(x), so we can assign p(x) to g(x).

This is a very useful result, because it tells us we don’t need the PMF of g(X) to find its expected value, and we are often interested in functions of random variables:

• expected revenues from the distribution of yields

• health outcomes as a function of food availability Think of other examples.

(8)

Variance of a random variable

Definition (variance and standard deviation): The variance of and r.v. X is Var(X) = E[(X−EX)²]

The square root of the variance is called the standard deviation of X.

SD(X) = p

Var(X)

Notice that by linearity, E(X−EX) = 0 always since positive and negative deviations cancel. By squaring them we ensure all deviations contribute to the overall variability. However variance is in squared units, so to get back the the original units we take the square root.

We can expand the above expression to rewrite the variance in a form that is often more convenient:

Var(X) = E(X²) − (EX)²

(9)

Variance properties

• Result 1: Var(X)≥ 0 with equality if and only if P(X =a) = 1 for some constant a.

• Result 2: For an constants a and b, Var(aX+b) = a²Var(X). It follows that Var(X) = Var(−X)

Proof: Var(aX+b) =E[(aX+b− (aEX+b))²] =E[(a(X−EX))²] =a²E[(X−EX)²] =a²Var(X)

• Result 3: If X₁, . . . ,X_n are independent random variables, then Var(X₁+· · ·+X_n) = Var(X₁) +· · ·+Var(X_n).

Note that this is only true with independent r.v.s, variance is not in general linear.

Proof: Denote EX by µ. For n=2, E(X₁+X₂) =µ₁+µ₂ and therefore

Var(X₁+X₂) =E[(X₁+X₂ −µ₁−µ₂)²] =E[(X₁−µ₁)²+ (X₂−µ₂)²+2(X₁−µ₁)(X₂−µ₂)]

Taking expectations, we get

E[(X₁−µ₁)²+ (X₂−µ₂)²+2(X₁−µ₁)(X₂ −µ₂)] =Var(X₁) +Var(X₂) +2E[(X₁−µ₁)(X₂−µ₂)]

But since X₁ and X₂ are independent,

E[(X₁ −µ₁)(X₂−µ₂)] =E(X₁−µ₁)E(X₂−µ₂) = (µ₁−µ₁)(µ₂ −µ₂) =0

It therefore follows that

Var(X₁+X₂) =Var(X₁) +Var(X₂) Using an induction argument, this can be established for any n

(10)

of X is

P(X =k) = e^−λλ^k k!

for k = 0, 1, 2, . . .. We write this as X ∼Pois(λ).

This is a valid PMF because the Taylor series 1+λ+ ^λ_2!² + ^λ_3!³ +. . . converges to e^λ so P

k

f(k) = P^∞

k=0

e^−λλ^k

k! =e^−λe^λ = 1.

As λ gets larger, the PMF becomes more bell-shaped.

(11)

The Poisson mean and variance

The expectation E(X) = e^−λ P^∞

k=0

k^λ_k!^k =e^−λ P^∞

k=1

k^λ_k!^k =λe^−λ P^∞

k=1

λ^k−1

(k−1)! =λe^−λe^λ =λ The variance is also λ. To see this let’s start with finding E(X²):

Since P^∞

k=0

e^−λλ^k

k! =1, P^∞

k=0 λ^k

k! =e^λ. Now differentiate both sides w.r.t λ (and ignore the k = 0 term since this vanishes):

X∞ k=1

kλ^k−1

k! = e^λ or X∞ k=1

kλ^k

k! = λe^λ Differentiate again to get

X∞ k=1

k²λ^k−1

k! = e^λ +λe^λ or X∞ k=1

k²λ^k

k! =e^λλ(1+λ)

So E(X²) = e^−λ P^∞

k=1

k²^λ_k!^k =λ(1+λ) and

Var(X) = λ(1+λ) −λ² =λ

We’ll see later that both these can be derived easily from the moment generating function of the Poisson.

(12)

Result: If X ∼Pois(λ₁) and Y ∼ Pois(λ₂) are independent, then X+Y ∼ Pois(λ₁ +λ₂)

Proof: To get the PMF of X+Y, we use the law of total probability:

P(X+Y =k) =

Xk

j=0

P(X+Y =k|X =j)P(X =j)

=

Xk

j=0

P(Y =k−j)P(X =j)

=

Xk

j=0

e^−λ²λ^k−j₂ (k−j)!

e^−λ¹λ^j₁ j!

= e^−(λ¹^+λ²⁾ k!

Xk

j=0

k j

λ^j₁λ^k−j₂

= e^−(λ¹^+λ²⁾(λ₁ +λ₂)^k

(using the binomial theorem)

(13)

A Poisson process

Suppose that the number of type A outcomes that occur over a fixed interval of time, [0,t]

follows a process in which

1. The probability that precisely one type A outcome will occur in a small interval of time ∆t is approximately proportional to the length of the interval:

g(1,∆t) = γ∆t+o(∆t)

where o(∆t) denotes a function of ∆t having the property that lim_∆t→0 ^o(∆t)_∆t =0.

2. The probability that two or more type A outcomes will occur in a small interval of time ∆t is negligible:

X∞ x=2

g(x,∆t) = o(∆t)

3. The numbers of type A outcomes that occur in nonoverlapping time intervals are independent events.

These conditions imply a process which is stationary over the period of observation, i.e the

probability of an occurrence must be the same over the entire period with neither busy nor quiet intervals.

(14)

a time interval t is a Poisson density with mean λ =γt.

Applications:

• the number of weaving defects in a yard of handloom cloth or stitching defects in a shirt

• the number of traffic accidents on a motorway in an hour

• the number of particles of a noxious substance that come out of chimney in a given period of time

• the number of times a machine breaks down each week Example:

• let the probability of exactly one blemish in a foot of wire be ₁₀₀₀¹ and that of two or more blemishes be zero.

• we’re interested in the number of blemishes in 3, 000 feet of wire.

• if the numbers of blemishes in non-overlapping intervals are assumed to be independently distributed, then our random variable X follows a poisson process with λ= γt= 3 and

(15)

The Poisson as a limiting distribution

We can show that a binomial distribution with large n and small p can be approximated by a Poisson ( which is computationally easier).

• useful result: e^v =lim_n→_∞(1+ _n^v)ⁿ

• We can rewrite the binomial density for non-zero values as f(x;n,p) =

Qx i=1

(n−i+1)

x! p^x(1−p)^n−x. If np= λ, we can subsitute for p by λ

n

to get

lim_n→_∞f(x;n,p) = lim_n→_∞ Qx i=1

(n−i+1) x!

λ n

x

1− λ n

n−x

= lim_n→_∞ Qx i=1

(n−i+1) n^x

λ^x x!

1− λ n

n

1− λ n

−x

= lim_n→_∞hn

n.(n−1)

n . . . .(n−x+1) n

λ^x x!

1− λ n

n

1− λ n

−xi

= e^−λλ^x x!

(using the above result and the property that the limit of a product is the product of the limits)

(16)

Poisson as a limiting distribution...example

• We have a 300 page novel with 1, 500 letters on each page.

• Typing errors are as likely to occur for one letter as for another, and the probability of such an error is given by p = 10⁻⁵.

• The total number of letters n = (300)∗(1500) = 450, 000

• Using λ =np, the poisson distribution function gives us the probability of the number of errors being less than or equal to 10 as:

P(x ≤ 10) ≈ X10

x=0

e^−4.5(4.5)^x

x! =.9933

Rules of Thumb: close to binomial probabilities when n ≥20 and p ≤ .05, excellent when n ≥ 100