Conditional Expectation

(1)

Conditional Expectation

Please see Hull’s book (Section 9.6.)

2.1 A Binomial Model for Stock Price Dynamics

Stock prices are assumed to follow this simple binomial model: The initial stock price during the period under study is denoted^S0. At each time step, the stock price either goes up by a factor of^u or down by a factor of^d. It will be useful to visualize tossing a coin at each time step, and say that

the stock price moves up by a factor of^uif the coin comes out heads (^H), and down by a factor of^dif it comes out tails (^T).

Note that we are not specifying the probability of heads here.

Consider a sequence of 3 tosses of the coin (See Fig. 2.1) The collection of all possible outcomes (i.e. sequences of tosses of length 3) is

=fHHH HHT HTH HTT THH THH THT TTH TTTg:

A typical sequence of will be denoted^!, and^!k will denote the^kth element in the sequence^!. We write^Sk

(!)to denote the stock price at “time”^k(i.e. after^ktosses) under the outcome^!. Note that^Sk

(!)depends only on^!1

!

2 ::: !

k. Thus in the 3-coin-toss example we write for instance,

S

1 (! )

4

=S

1 (!

1

!

2

!

3 )

4

=S

1 (!

1 )

S

2 (! )

4

=S

2 (!

1

!

2

!

3 )

4

=S

2 (!

1

!

2 ):

Each^Sk is a random variable defined on the set . More precisely, let^F ⁼ ^P( ⁾. Then^F is a

-algebra and⁽ ^F)is a measurable space. Each^Sk is an^F-measurable function ^{! I}^R, that is,

S 1

k

is a function^{B !F} where^B is the Borel-algebra on IR. We will see later that^Sk is in fact 49

(2)

S3 (HHH) = u

S3 S3

S3(HHT) = u2d S0

S3

S3 S3 (HH) = u2

S2 S0

S0 1 ω = Η1

ω = Τ 2

2

2 3

3

3 3

3

3 ω = Η

ω = Τ

ω = Η

ω = Τ

ω = Η

ω = Τ

ω = Η ω = Τ

ω = Η

ω = Τ S (T) = dS

1

S (H) = uS1

0 0

S3 S0

(TTT) = d3 S0 3

2d S0 2d S0 (HTH) = u (THH) = u

2 S0

2 S0 (HTT) = d u 2 S0 (TTH) = d (THT) = d u

u

2 S2(TT) = d S0

S (TH) = ud S2 (HT) = ud S S2

0 0

Figure 2.1: A three coin period binomial model.

measurable under a sub- -algebra of^F. Recall that the Borel -algebra^Bis the -algebra generated by the open intervals of IR. In this course we will always deal with subsets of IR that belong to^B. For any random variable^X defined on a sample space and any^y²^I^R, we will use the notation:

fX yg 4

=f!2 X(!)yg:

The sets^fX ^<^yg^fX^yg^fX ⁼^ygetc, are defined similarly. Similarly for any subset^Bof^I^R, we define

fX 2Bg 4

=f!2 X(!)2Bg:

Assumption 2.1 ^u^>^d^>⁰.

2.2 Information

Definition 2.1 (Sets determined by the first^ktosses.) We say that a set^A is determined by the first^kcoin tosses if, knowing only the outcome of the first^ktosses, we can decide whether the outcome of all tosses is in^A. In general we denote the collection of sets determined by the first^k tosses by^Fk. It is easy to check that^Fk is a -algebra.

Note that the random variable^Skis^Fk-measurable, for each^k⁼¹²^:^:^:ⁿ.

Example 2.1 In the 3 coin-toss example, the collection^F¹of sets determined by the first toss consists of:

(3)

1. ^A^H ⁴⁼^fHHH ^HHT ^H^T^H ^HT^T^g, 2. ^A^T ⁼⁴^fT^HH ^T^H^T ^T^T^H ^T^T^T^g, 3. ,

4. .

The collection^F2of sets determined by the first two tosses consists of:

1. Â^HH ⁼⁴^fHH^H ^HHT^g, 2. Â^HT ⁼⁴^fHT^H ^H^T^T^g, 3. ÂTH

4

=fTHH THTg, 4. ^ATT

4

=fTTH TTTg,

5. The complements of the above sets,

6. Any union of the above sets (including the complements), 7. and.

Definition 2.2 (Information carried by a random variable.) Let^Xbe a random variable ^!I^R. We say that a setÂ is determined by the random variable^X if, knowing only the value^{X(! )} of the random variable, we can decide whether or not^!²Â. Another way of saying this is that for every^y ²Î^R, either^X ¹^(y)Âor^X ¹^(y)^\Â⁼. The collection of susbets of determined by^Xis a-algebra, which we call the-algebra generated by^X, and denote by^(X).

If the random variable^Xtakes finitely many different values, then^(X)is generated by the collection of sets

fX 1

(X(! ))j!2 g

these sets are called the atoms of the-algebra^(X).

In general, if^Xis a random variable ^{! I}^R, then^(X)is given by

(X)=fX 1

(B)B2B g:

Example 2.2 (Sets determined by^S2) The-algebra generated by^S2consists of the following sets:

1. ^AHH

=fHHH HHTg=f!2S

2 (! )=u

2

S

0 g, 2. ^ATT

=fTTH TTTg=fS

2

=d 2

S

0 g

3. ^AHT A

TH

=fS

2

=udS

0 g

4. Complements of the above sets, 5. Any union of the above sets, 6. ⁼^fS2

(! )2g, 7. ⁼^fS2

(! )2IRg.

(4)

2.3 Conditional Expectation

In order to talk about conditional expectation, we need to introduce a probability measure on our coin-toss sample space . Let us define

p2(0 1)is the probability of^H,

q 4

=(1p)is the probability of^T,

the coin tosses are independent, so that, e.g.,^I^P^(HHT⁾⁼^p²^q etc.

IP(A) 4

= P

!2A

IP(!),^8A . Definition 2.3 (Expectation.)

IEX 4

= X

!2

X(! )IP(!):

If^A then

I

A (! )

4

= (

1 if^!²^A

0 if^!⁶²^A and

IE(I

A X) =

Z

A

XdIP = X

!2A

X(! )IP(!):

We can think of^I^E(IA

X)as a partial average of^X over the set^A.

2.3.1 An example

Let us estimate ^S1, given ^S2. Denote the estimate by^I^E(S1 jS

2

). From elementary probability,

IE(S

1 jS

2

)is a random variable^Y whose value at^!is defined by

Y(! )=IE(S

1 jS

2

=y)

where^y⁼^S2

(! ). Properties of^I^E(S1 jS

2 ):

IE(S

1 jS

2

)should depend on^!, i.e., it is a random variable.

If the value of^S2is known, then the value of^I^E(S1 jS

2

)should also be known. In particular, – If^!⁼^H^HHor^!⁼^H^HT, then^S2

(!)=u 2

S

0. If we know that^S2

(!)=u 2

S

0, then even without knowing^!, we know that^S1

(!)=uS

0. We define

IE(S

1 jS

2

)(HHH)=IE(S

1 jS

2

)(HHT)=uS

0 :

– If^! ⁼^T^T^T or^!⁼ ^T^T^H, then^S2

(!)= d 2

S

0. If we know that^S2

(!)= d 2

S

0, then even without knowing^!, we know that^S1

(!)=dS

0. We define

IE(S jS )(TTT)=IE(S jS )(TTH)=dS :

(5)

– If^! ² ^A⁼ ^fH^T^H^H^T^T^T^H^H^T^HT^g, then^S2

(!)= udS

0. If we know^S2 (!) =

udS

0, then we do not know whether^S1

=uS

0or^S1

=dS

0. We then take a weighted average:

IP(A)=p 2

q+pq 2

+p 2

q+pq 2

=2pq:

Furthermore,

Z

A S

1

dIP = p 2

quS

0 +pq

2

uS

0 +p

2

qdS

0 +pq

2

dS

0

= pq(u+d)S

0

For^!²^Awe define

IE(S

1 jS

2 )(! )=

R

A S

1 dIP

IP(A)

= 1

2

(u+d)S

0 :

Then ^Z

A IE(S

1 jS

2 )dIP =

Z

A S

1 dIP :

In conclusion, we can write

IE(S

1 jS

2

)(! )=g(S

2 (!))

where

g(x)= 8

>

<

>

: uS

0 if^x⁼^u²^S0 1

2

(u+d)S

0 if^x⁼^udS0 dS

0 if^x⁼^d²^S0

In other words,^I^E(S1 jS

2

)is random only through dependence on^S2. We also write

IE(S

1 jS

2

=x)=g(x)

where^gis the function defined above.

The random variable^I^E(S1 jS

2

)has two fundamental properties:

IE(S

1 jS

2

)is^(S2

)-measurable.

For every set^A²^(S2 ),

Z

A IE(S

1 jS

2

)dIP = Z

A S

1 dIP :

2.3.2 Definition of Conditional Expectation

Please see Williams, p.83.

Let⁽^FÎ^P⁾be a probability space, and let^Gbe a sub--algebra of^F. Let^Xbe a random variable on⁽^FÎ^P⁾. ThenÎ^{E(XjG )}is defined to be any random variable^Y that satisfies:

(a) ^Y is^G-measurable,

(6)

(b) For every set^A² ^G, we have the “partial averaging property”

Z

A

YdIP = Z

A XdIP :

Existence. There is always a random variable ^Y satisfying the above properties (provided that

IEjXj<1), i.e., conditional expectations always exist.

Uniqueness. There can be more than one random variable^Y satisfying the above properties, but if

Y

0is another one, then^Y ⁼^Y⁰almost surely, i.e.,^I^P^f!²^Y^{(! )}⁼^Y⁰^{(! )g}⁼^1:

Notation 2.1 For random variables^X^Y, it is standard notation to write

IE(XjY) 4

=IE(Xj(Y)):

Here are some useful ways to think about^I^E(XjG):

A random experiment is performed, i.e., an element ^!of is selected. The value of^! is partially but not fully revealed to us, and thus we cannot compute the exact value of^X^{(! )}. Based on what we know about^!, we compute an estimate of^{X(! )}. Because this estimate depends on the partial information we have about^!, it depends on^!, i.e.,^I^EXjY^{](! )}is a function of^!, although the dependence on^!is often not shown explicitly.

If the-algebra^Gcontains finitely many sets, there will be a “smallest” set^Ain^Gcontaining

!, which is the intersection of all sets in^Gcontaining^!. The way^!is partially revealed to us is that we are told it is inÂ, but not told which element ofÂit is. We then defineÎÊX^jY^{](! )} to be the average (with respect toÎ^P) value of^Xover this setÂ. Thus, for all^!in this setÂ,

IEXjY](! )will be the same.

2.3.3 Further discussion of Partial Averaging

The partial averaging property is

Z

A

IE(XjG)dIP = Z

A

XdIP 8A2G: (3.1)

We can rewrite this as

IEI

A

:IE(XjG)]=IEI

A

:X]: (3.2)

Note that^IAis a^G-measurable random variable. In fact the following holds:

Lemma 3.10 If^V is any^G-measurable random variable, then providedÎÊjV:IÊ(XjG)j^<¹,

IEV:IE(XjG)]=IEV:X]: (3.3)

(7)

Proof: To see this, first use (3.2) and linearity of expectations to prove (3.3) when^V is a simple

G-measurable random variable, i.e.,^V is of the form^V ⁼

P

n

k =1 c

k I

A

K

, where each^Ak is in^G and each^ck is constant. Next consider the case that^V is a nonnegative^G-measurable random variable, but is not necessarily simple. Such a ^V can be written as the limit of an increasing sequence of simple random variables ^Vn; we write (3.3) for each^Vn and then pass to the limit, using the Monotone Convergence Theorem (See Williams), to obtain (3.3) for ^V. Finally, the general ^G- measurable random variable^V can be written as the difference of two nonnegative random-variables

V = V +

V , and since (3.3) holds for^V⁺ and^V it must hold for^V as well. Williams calls this argument the “standard machine” (p. 56).

Based on this lemma, we can replace the second condition in the definition of a conditional expectation (Section 2.3.2) by:

(b’) For every^G-measurable random-variable^V, we have

IEV:IE(XjG)]=IEV:X]: (3.4)

2.3.4 Properties of Conditional Expectation

Please see Willams p. 88. Proof sketches of some of the properties are provided below.

(a) ÎÊ(IÊ(XjG))⁼ÎÊ(X):

Proof: Just take^Ain the partial averaging property to be.

The conditional expectation of^Xis thus an unbiased estimator of the random variable^X. (b) If^Xis^G-measurable, then

IE(XjG )=X :

Proof: The partial averaging property holds trivially when^Y is replaced by^X. And since^X is^G-measurable,^X satisfies the requirement (a) of a conditional expectation as well.

If the information content of^Gis sufficient to determine^X, then the best estimate of^Xbased on^Gis^X itself.

(c) (Linearity)

IE(a

1 X

1 +a

2 X

2

jG)=a

1 IE(X

1 jG)+a

2 IE(X

2 jG):

(d) (Positivity) If^X ⁰almost surely, then

IE(XjG )0:

Proof: TakeÂ⁼^f!²ÎÊ(XjG)(!)^<^0g. This set is in^GsinceÎ^{E(XjG )}is^G-measurable.

Partial averaging implies

R

A

IE(XjG )dIP = R

A

XdIP. The right-hand side is greater than or equal to zero, and the left-hand side is strictly negative, unless^I^P^(A) ⁼ ⁰. Therefore,

IP(A)=0.

(8)

(h) (Jensen’s Inequality) If ^:^R!Ris convex and^I^Ej ^(X)j^<¹, then

IE( (X)jG ) (IE(XjG)):

Recall the usual Jensen’s Inequality:ÎÊ ^(X) ^(IÊ(X)):

(i) (Tower Property) If^His a sub--algebra of^G, then

IE(IE(XjG )jH)=IE(XjH):

His a sub--algebra of^Gmeans that^Gcontains more information than^H. If we estimate^X based on the information in^G, and then estimate the estimator based on the smaller amount of information in^H, then we get the same result as if we had estimated^Xdirectly based on the information in^H.

(j) (Taking out what is known) If^Zis^G-measurable, then

IE(ZXjG )=Z :IE(XjG):

When conditioning on^G, the^G-measurable random variable^Zacts like a constant.

Proof: Let^Zbe a^G-measurable random variable. A random variable^Y is^I^E(ZX^{jG )}if and only if

(a) ^Y is^G-measurable;

(b)

R

A

YdIP = R

A

ZXdIP 8A2G.

Take^Y ⁼ ^{Z :I}^E(XjG). Then^Y satisfies (a) (a product of^G-measurable random variables is

G-measurable). ^Y also satisfies property (b), as we can check below:

Z

A

YdIP = IE(I

A :Y)

= IEI

A

ZIE(XjG )]

= IEI

A

Z :X] ((b’) with^V ⁼^IA Z

= Z

A

ZXdIP :

(k) (Role of Independence) If^His independent of^{( (X}⁾^G), then

IE(Xj(GH))=IE(XjG ):

In particular, if^X is independent of^H, then

IE(XjH)=IE(X):

If^His independent of^X and^G, then nothing is gained by including the information content of^Hin the estimation of^X.

(9)

2.3.5 Examples from the Binomial Model

Recall that^F1

=f A

H A

T

g. Notice that^I^E(S2 jF

1

)must be constant onÂH andÂT. Now sinceÎÊ(S2

jF

1

)must satisfy the partial averaging property,

Z

A

H IE(S

2 jF

1 )dIP =

Z

A

H S

2 dIP

Z

A

T IE(S

2 jF

1 )dIP =

Z

A

T S

2 dIP :

We compute

Z

A

H IE(S

2 jF

1

)dIP = IP(A

H ):IE(S

2 jF

1 )(! )

= pIE(S

2 jF

1

)(!)8!2A

H :

On the other hand, Z

A

H S

2

dIP =p 2

u 2

S

0

+pqudS

0 :

Therefore,

IE(S

2 jF

1

)(!)=pu 2

S

0

+qudS

0

8!2A

H :

We can also write

IE(S

2 jF

1

)(!) = pu 2

S

0

+qudS

0

= (pu+qd)uS

0

= (pu+qd)S

1

(! )8!2A

H

Similarly,

IE(S

2 jF

1

)(! )=(pu+qd)S

1

(! )8!2A

T :

Thus in both cases we have

IE(S

2 jF

1

)(!)=(pu+qd)S

1

(!)8!2:

A similar argument one time step later shows that

IE(S

3 jF

2

)(!)=(pu+qd)S

2 (!):

We leave the verification of this equality as an exercise. We can verify the Tower Property, for instance, from the previous equations we have

IEIE(S

3 jF

2 )jF

1

] = IE(pu+qd)S

2 jF

2 ]

= (pu+qd)IE(S

2 jF

1

) (linearity)

= (pu+qd) 2

S

1 :

This final expression is^I^E(S3 jF

1 ).