9 Properties of Expectation

(1)

SEOUL NATIONAL UNIVERSITY

School of Mechanical & Aerospace Engineering

446.358

Engineering Probability

9 Properties of Expectation

Recall

E[X] = X

x

xp(x) : descrete random variable E[X] =

Z _∞

−∞

xf(x)dx : continuous random variable

•If

P{a≤X≤b} = 1 then a≤E[X]≤b Proof. (Discrete random variable)

E[X] = X

x:p(x)>0

xp(x) ≥ X

x:p(x)>0

ap(x) =a X

x:p(x)>0

p(x) =a

Similarly, E[X]≤ b Proposition

•If X & Y have a joint probability mass function p(x, y), then E[g(X, Y)] = X

y

X

x

g(x, y) p(x, y)

•If X & Y have a joint probability density function f(x, y), then E[g(X, Y)] =

Z _∞

−∞

Z _∞

−∞

g(x, y) f(x, y)dxdy

•Whenever E[X] & E[Y] are finite,

E[X + Y] = E[X] + E[Y]

•For random variables X & Y such that X≥Y,

E[X - Y] ≥0 ⇒ E[X]≥E[Y]

•If E[X_i] is finite for alli= 1,2, ...., n, then

E[X₁+...+X_n] = E[X₁] +...+ E[X_n].

(2)

Example .

X₁, ...., X_n: i.i.d. random variables having distribution function F and expected valueµ Such a sequence of random variables is said to constitute a sample from the

distribution F.

X : = Xn

i=1

X_i

n : Sample mean Then

E[X] =E

" _n X

i=1

X_i n

#

= 1 nE[P

X_i] = 1 n

XE[X_i] = µ

∴The expected value of the sample mean = the mean of the distribution.

Whenµ is unknown, the sample mean is often used in statistics to estimate it.

Example .

A₁, ..., A_n: Some events

indicator variablesX_i =

½ 1 ifA_i occurs 0 else

X: = Xn

i=1

X_i : # of the events A_i that occurs

Y: =

½ 1 if X≥1 0 else

⇒ X ≥Y ⇒ E[X]

Pn|{z}

i=1E[Xi]

| {z }

Pn i=1P[Ai]

≥ E[Y]

|{z}

P{at least one of the Ai occure}

| {z }

P(Fn i=1Ai)

∴ P Ã _n

G

i=1

A_i

!

≤ Xn

i=1

P(A_i) : Boole’s inequality

(3)

Example . [A random walk in the plane]

Starting from a given point in the plane, taken a sequence of steps of fixed length but in a completely random direction and uniform over (0, 2π).

The expected square of the distance from the origin aftern steps ? Solution.

Let (X_i, Y_i) denote the change in position at thei^th step. then X_i = cosθ_i

Y_i = sinθ_i

θ_i ,i= 1, ..., n : indept, uniform (0, 2π) random variables Aftern steps,

the position = ÃXn

i=1

X_i, Xn

i=1

Y_i

!

We are interested in

D² = ³X X_i

´₂

+ ³X

Y_i

´₂

= X ¡

X_i² +Y_i²¢

+ ³X X

i6=jX_iX_j +Y_iY_j

´

= n + X X

i6=j(cosθ_icosθ_j + sinθ_isinθ_j) θ_i&θ_j (i6=j) : independent, and

E[cosθ_i] = 1 2π

Z _2π

0

cosu du= 0.

E[sinθ_i] = 1 2π

Z _2π

0

sinu du= 0.

∴ E[D²] = n.

(4)

Example . [Analyzing the quick-sort algorithm]

We are given a set ofndistinct valuesx₁, ..., x_n, and we want to sort them in increasing order.

Quick-Sort Algorithm

•n=2 ⇒ Compares two values & puts them in an appropriate order.

•n >2 ⇒ One of the element is randomly chosen (say, x_i), then all the other values are compared withx_i.

{...}

| {z }

smallar than xi

x_i {...}

| {z }

larger than xi

repetition on these brackets continuous until all the values have been sorted.

Example .

5, 9, 3, 10, 11, 14, 8, 4, 17, 6 Choose one of them at random.

Suppose that 10 is chosen {5 9 3 8 4 6}

| {z }

say6is chosen

10 {11 14 17}

| {z }

say11is chosen

{5 3 4}

| {z }

say4is chosen

6 {9 8}

| {z }

say ... chosen

1011 {14 17}

| {z }

say ... chosen

⇓ ⇓ ⇓

{3}4 {5} 6 8 9 10 11 14 17

X : # of comparisons that it takes that algorithm to sortndistinct numbers.

E[X] : a measure of the effectiveness of this algorithm.

E[X] = ?

Let 1 stands for the smallest value to be sorted Let 2 stands for the next smallest value to be sorted

For 1≤i < j ≤n, letI(i, j) =

½ 1 ifi&j are ever directly compared 0 else

(5)

Then, X =

n−1X

i=1

Xn

j=i+1

I(i, j)

E[X] = E



ⁿ⁻¹X

i=1

Xn

j=i+1

I(i, j)





=

n−1X

i=1

Xn

j=i+1

E [I(i, j)]

= X X

P{i&j are ever compared}

| {z }

?

Initially,i, i+ 1, ..., j−1, jwill be in the same bracket, and they will remain in the same bracket if the number chosen for the first comparison is not betweeni&j.

If one ofi+ 1, ..., j−1 is chosen for the comparison, i

will go into a left bracket andjwill go into a right bracket. →i&jwill never be compared.

Ifiorj is chosen, then there will be a direct comparison between iorj.

Probability thatiorj is chosen among i, i+ 1, ..., j−1, j is _j−i+1² . P{i&j are ever compared}

E[X] =

n−1X

i=1

Xn

j=i+1

2 j−i+ 1. Whenn is large,

Xn

j=i+1

2 j−i+ 1 ≈

Z _n

i+1

2 x−i+ 1dx

= 2 log(x−i+ 1)|ⁿ_i+1

= 2 log(n−i+ 1)−2 log 2

≈2 log(n−i+ 1)

∴ E[X]≈ 2

n−1X

i=1

log(n−i+ 1)

≈2 Z _n−1

1

log(n−x+ 1)dx

(6)

= 2 Z _n

2

log(y)dy

= 2(ylogy−y)|ⁿ₂

≈2nlogn.

9.1 Covariance Proposition

If X & Y are independent then for any functionsh&g, E[g(X)h(Y)] = E[g(X)]E[h(Y)]

Proof.

Suppose thatX&Y are jointly continuous with f(x, y).

E[g(X)h(Y)] = Z _∞

−∞

Z _∞

−∞

g(x)h(y)f(x, y)dxdy

= Z Z

g(x)h(y)f_X(x)f_Y(y)dxdy

= Z

....dx Z

....dy

= E[h(Y)]E[g(X)]

Definition: The Covarience between X & Y is

Cov (X, Y) = E[(X - E[X])(Y - E[Y])]

Cov (X, Y) = E[XY - E[X]Y - E[Y]X + E[X]E[Y]]

= E[XY] - E[X]E[Y]]

IfX&Y are independent then Cov(X, Y) = 0.

(Please: refereed for counter example to text page 328)

(7)

9.1.1 Properties of Covariance (i) Cov (X, Y) = Cov (Y, X)

(ii) Cov (X, X) = Var (X)

(iii) Cov (aX, Y) = a Cov (Y, X) (iv) Cov (P_n

i=1X_i,P_m

j=1Y_j) =P

i

P

j Cov (X_i, Y_j) Proof. of (iv)

Letµ_i =E[(X_i)], ν_i =E[(Y_j)]

Then

E

"

X

i

X_i

#

= X

i

µ_i , E



X

j

Y_j



 = X

j

ν_j

Cov



X

i

X_i, X

j

Y_j



= E



 ÃX

i

X_i - X

i

µ_i

! 

X

j

Y_j - X

j

ν_j









= E



X

i

(X_i - µ_i)X

j

(Y_j -ν_j)





= E



X

i

X

j

(X_i - µ_i) (Y_j -ν_j)





= X

i

X

j

E [(X_i - µ_i) (Y_j - ν_j)].

•From (ii) & (iv), Var

ÃXn

i=1

X_i

!

= Cov



Xⁿ

i=1

X_i, Xn

j=1

X_j





= Xn

i=1

Xn

j=1

Cov (X_i,X_j)

= Xn

i=1

Var (X_i) + X X

i6=jCov (X_i,X_j)

∴ Var Ã _n

X

i=1

X_i

!

= Xn

i=1

Var (X_i) + 2X X

i<jCov (X_i,X_j)

(8)

•If X_i, ...., X_nare pairwise independent (i.e. X_i&X_j are independent fori6=j), then

Var Ã _n

X

i=1

X_i

!

= Xn

i=1

Var (X_i) ... (∗)

Example .

X₁, ..., X_n : i.i.d. random variables, expected value µ, varianceσ². X = 1

n Xn

i=1

X_i : sample mean X_i−X (i= 1, ...., n) : deviations

S² = Xn

i=1

(X_i−X)²

n−1 : sample variance Then,

Var(X) = µ1

n

¶₂X_n

i=1

Var(X_i) by (∗)

= σ² n (n- 1)S² =

Xn

i=1

(X_i−µ+µ−X)²

= X

i

(X_i−µ)²+X

i

(X−µ)²−2(X−µ)X

i

(X_i−µ)

= Xn

i=1

(X_i−µ)²−n(X−µ)². Take Expectations:

(n−1)E[S²] = Xn

i=1

E(X_i−µ)²−nE(X−µ)²

= nσ²−nV ar(X)

(9)

= (n−1)σ²

∴ E[S²] =σ²

Example .

The variance of a binomial random variable X with parametersn&p.

Solution.

X : # of successes innindependent trials, each with success probability p.

X=X₁+....+X_n. X_i =

½ 1 ifi^th trial is a success (bernoulli) 0 else

Var(X) = Var(X_i) +...+ Var(X_n) Var(X_i) = E[X_i²]−(E[X_i])²

by replacing X_i=X_i² Var(X) = E[X_i]−(E[X_i])²

=p−p²

∴ Var(X) =np(1−p).

Definition

The Correlation of two random variablesX&Y is ρ(X, Y) = Cov(X, Y)

pV ar(X)V ar(Y) (as long as Var(X) Var(Y)> 0)

(10)

• -1 ≤ρ(X, Y)≤1

Proof.

Suppose thatX&Y have variances σ_x² & σ_y² Then

0 ≤ Var µX

σ_x + Y σ_y

¶

= V ar(X)

σ²_x +V ar(Y)

σ²_y +2Cov(X, Y) σ_xσ_y

= 2 [1 + ρ(X, Y)]

∴ ρ(X, Y)≥ −1 0 ≤ Var

µX σ_x − Y

σ_y

¶

= V ar(X)

σ²_x +V ar(Y)

σ²_y −2Cov(X, Y) σ_xσ_y

= 2 [1 - ρ(X, Y)]

∴ ρ(X, Y)≤1

Remarks

Var (Z) = 0 ⇒ Z is constant with probability 1 (to be proved in chapter-8) ρ(X, Y) = 1 ⇔ Y = a + bX, b = ^σ_σ^y_x >0

ρ(X, Y) =−1 ⇔ Y = a + bX, b = -^σ_σ^y

x <0

(11)

Remarks

The correlation coefficient is a measure of the degree of linearity betweenX&Y.

-1 0 1

Y tend to decrease when X increases

X, Y are Uncorrelated

Y tend to increase when X does

High degree of linearity between X & Y

Example .

I_A, I_B : indicator variables for the events A & B.

Then

E[I_A] = P(A) E[I_B] = P(B) E[I_AI_B] = P(AB) So,

Cov(I_A, I_B) = P(AB) - P(A) P(B)

= P(B)[P(A|B) - P(A)]

Positively Correlated⇒depending on⇒P(A|B)>P(A) UnCorrelated⇒depending on⇒P(A|B) = P(A) Negatively Correlated⇒depending on⇒P(A|B)<P(A)

(12)

Example .

X₁, ..., X_n : i.i.d., variance σ² Then,

Cov (X_i - X,X) = Cov (X_i,X)−Cov (X,X)

= Cov (X_i, 1 n

X

j

X_j) - Var(X)

= 1 n

X

j

Cov(X_i, X_j)−σ² n

= σ² n −σ²

n = 0

Remarks

X and X_i−X are uncorrelated, but in general they are not independent.