Conditional Expectation
Please see Hull’s book (Section 9.6.)
2.1 A Binomial Model for Stock Price Dynamics
Stock prices are assumed to follow this simple binomial model: The initial stock price during the period under study is denotedS0. At each time step, the stock price either goes up by a factor ofu or down by a factor ofd. It will be useful to visualize tossing a coin at each time step, and say that
the stock price moves up by a factor ofuif the coin comes out heads (H), and down by a factor ofdif it comes out tails (T).
Note that we are not specifying the probability of heads here.
Consider a sequence of 3 tosses of the coin (See Fig. 2.1) The collection of all possible outcomes (i.e. sequences of tosses of length 3) is
=fHHH HHT HTH HTT THH THH THT TTH TTTg:
A typical sequence of will be denoted!, and!k will denote thekth element in the sequence!. We writeSk
(!)to denote the stock price at “time”k(i.e. afterktosses) under the outcome!. Note thatSk
(!)depends only on!1
!
2 ::: !
k. Thus in the 3-coin-toss example we write for instance,
S
1 (! )
4
=S
1 (!
1
!
2
!
3 )
4
=S
1 (!
1 )
S
2 (! )
4
=S
2 (!
1
!
2
!
3 )
4
=S
2 (!
1
!
2 ):
EachSk is a random variable defined on the set . More precisely, letF = P( ). ThenF is a
-algebra and( F)is a measurable space. EachSk is anF-measurable function ! IR, that is,
S 1
k
is a functionB !F whereB is the Borel-algebra on IR. We will see later thatSk is in fact 49
S3 (HHH) = u
S3 S3
S3(HHT) = u2d S0
S3
S3 S3 (HH) = u2
S2 S0
S0 1 ω = Η1
ω = Τ 2
2
2
2 3
3
3 3
3
3 ω = Η
ω = Τ
ω = Η
ω = Τ
ω = Η
ω = Τ
ω = Η ω = Τ
ω = Η
ω = Τ S (T) = dS
1
S (H) = uS1
0 0
S3 S0
(TTT) = d3 S0 3
2d S0 2d S0 (HTH) = u (THH) = u
2 S0
2 S0 (HTT) = d u 2 S0 (TTH) = d (THT) = d u
u
2 S2(TT) = d S0
S (TH) = ud S2 (HT) = ud S S2
0 0
Figure 2.1: A three coin period binomial model.
measurable under a sub- -algebra ofF. Recall that the Borel -algebraBis the -algebra generated by the open intervals of IR. In this course we will always deal with subsets of IR that belong toB. For any random variableX defined on a sample space and anyy2IR, we will use the notation:
fX yg 4
=f!2 X(!)yg:
The setsfX <ygfXygfX =ygetc, are defined similarly. Similarly for any subsetBofIR, we define
fX 2Bg 4
=f!2 X(!)2Bg:
Assumption 2.1 u>d>0.
2.2 Information
Definition 2.1 (Sets determined by the firstktosses.) We say that a setA is determined by the firstkcoin tosses if, knowing only the outcome of the firstktosses, we can decide whether the outcome of all tosses is inA. In general we denote the collection of sets determined by the firstk tosses byFk. It is easy to check thatFk is a -algebra.
Note that the random variableSkisFk-measurable, for eachk=12:::n.
Example 2.1 In the 3 coin-toss example, the collectionF1of sets determined by the first toss consists of:
1. AH 4=fHHH HHT HTH HTTg, 2. AT =4fTHH THT TTH TTTg, 3. ,
4. .
The collectionF2of sets determined by the first two tosses consists of:
1. AHH =4fHHH HHTg, 2. AHT =4fHTH HTTg, 3. ATH
4
=fTHH THTg, 4. ATT
4
=fTTH TTTg,
5. The complements of the above sets,
6. Any union of the above sets (including the complements), 7. and.
Definition 2.2 (Information carried by a random variable.) LetXbe a random variable !IR. We say that a setA is determined by the random variableX if, knowing only the valueX(! ) of the random variable, we can decide whether or not!2A. Another way of saying this is that for everyy 2IR, eitherX 1(y)AorX 1(y)\A=. The collection of susbets of determined byXis a-algebra, which we call the-algebra generated byX, and denote by (X).
If the random variableXtakes finitely many different values, then (X)is generated by the collec- tion of sets
fX 1
(X(! ))j!2 g
these sets are called the atoms of the-algebra(X).
In general, ifXis a random variable ! IR, then(X)is given by
(X)=fX 1
(B)B2B g:
Example 2.2 (Sets determined byS2) The-algebra generated byS2consists of the following sets:
1. AHH
=fHHH HHTg=f!2S
2 (! )=u
2
S
0 g, 2. ATT
=fTTH TTTg=fS
2
=d 2
S
0 g
3. AHT A
TH
=fS
2
=udS
0 g
4. Complements of the above sets, 5. Any union of the above sets, 6. =fS2
(! )2g, 7. =fS2
(! )2IRg.
2.3 Conditional Expectation
In order to talk about conditional expectation, we need to introduce a probability measure on our coin-toss sample space . Let us define
p2(0 1)is the probability ofH,
q 4
=(1p)is the probability ofT,
the coin tosses are independent, so that, e.g.,IP(HHT)=p2q etc.
IP(A) 4
= P
!2A
IP(!),8A . Definition 2.3 (Expectation.)
IEX 4
= X
!2
X(! )IP(!):
IfA then
I
A (! )
4
= (
1 if!2A
0 if!62A and
IE(I
A X) =
Z
A
XdIP = X
!2A
X(! )IP(!):
We can think ofIE(IA
X)as a partial average ofX over the setA.
2.3.1 An example
Let us estimate S1, given S2. Denote the estimate byIE(S1 jS
2
). From elementary probability,
IE(S
1 jS
2
)is a random variableY whose value at!is defined by
Y(! )=IE(S
1 jS
2
=y)
wherey=S2
(! ). Properties ofIE(S1 jS
2 ):
IE(S
1 jS
2
)should depend on!, i.e., it is a random variable.
If the value ofS2is known, then the value ofIE(S1 jS
2
)should also be known. In particular, – If!=HHHor!=HHT, thenS2
(!)=u 2
S
0. If we know thatS2
(!)=u 2
S
0, then even without knowing!, we know thatS1
(!)=uS
0. We define
IE(S
1 jS
2
)(HHH)=IE(S
1 jS
2
)(HHT)=uS
0 :
– If! =TTT or!= TTH, thenS2
(!)= d 2
S
0. If we know thatS2
(!)= d 2
S
0, then even without knowing!, we know thatS1
(!)=dS
0. We define
IE(S jS )(TTT)=IE(S jS )(TTH)=dS :
– If! 2 A= fHTH HTTTHHTHTg, thenS2
(!)= udS
0. If we knowS2 (!) =
udS
0, then we do not know whetherS1
=uS
0orS1
=dS
0. We then take a weighted average:
IP(A)=p 2
q+pq 2
+p 2
q+pq 2
=2pq:
Furthermore,
Z
A S
1
dIP = p 2
quS
0 +pq
2
uS
0 +p
2
qdS
0 +pq
2
dS
0
= pq(u+d)S
0
For!2Awe define
IE(S
1 jS
2 )(! )=
R
A S
1 dIP
IP(A)
= 1
2
(u+d)S
0 :
Then Z
A IE(S
1 jS
2 )dIP =
Z
A S
1 dIP :
In conclusion, we can write
IE(S
1 jS
2
)(! )=g(S
2 (!))
where
g(x)= 8
>
<
>
: uS
0 ifx=u2S0 1
2
(u+d)S
0 ifx=udS0 dS
0 ifx=d2S0
In other words,IE(S1 jS
2
)is random only through dependence onS2. We also write
IE(S
1 jS
2
=x)=g(x)
wheregis the function defined above.
The random variableIE(S1 jS
2
)has two fundamental properties:
IE(S
1 jS
2
)is(S2
)-measurable.
For every setA2(S2 ),
Z
A IE(S
1 jS
2
)dIP = Z
A S
1 dIP :
2.3.2 Definition of Conditional Expectation
Please see Williams, p.83.Let(FIP)be a probability space, and letGbe a sub--algebra ofF. LetXbe a random variable on(FIP). ThenIE(XjG )is defined to be any random variableY that satisfies:
(a) Y isG-measurable,
(b) For every setA2 G, we have the “partial averaging property”
Z
A
YdIP = Z
A XdIP :
Existence. There is always a random variable Y satisfying the above properties (provided that
IEjXj<1), i.e., conditional expectations always exist.
Uniqueness. There can be more than one random variableY satisfying the above properties, but if
Y
0is another one, thenY =Y0almost surely, i.e.,IPf!2Y(! )=Y0(! )g=1:
Notation 2.1 For random variablesX Y, it is standard notation to write
IE(XjY) 4
=IE(Xj(Y)):
Here are some useful ways to think aboutIE(XjG):
A random experiment is performed, i.e., an element !of is selected. The value of! is partially but not fully revealed to us, and thus we cannot compute the exact value ofX(! ). Based on what we know about!, we compute an estimate ofX(! ). Because this estimate depends on the partial information we have about!, it depends on!, i.e.,IEXjY](! )is a function of!, although the dependence on!is often not shown explicitly.
If the-algebraGcontains finitely many sets, there will be a “smallest” setAinGcontaining
!, which is the intersection of all sets inGcontaining!. The way!is partially revealed to us is that we are told it is inA, but not told which element ofAit is. We then defineIEXjY](! ) to be the average (with respect toIP) value ofXover this setA. Thus, for all!in this setA,
IEXjY](! )will be the same.
2.3.3 Further discussion of Partial Averaging
The partial averaging property is
Z
A
IE(XjG)dIP = Z
A
XdIP 8A2G: (3.1)
We can rewrite this as
IEI
A
:IE(XjG)]=IEI
A
:X]: (3.2)
Note thatIAis aG-measurable random variable. In fact the following holds:
Lemma 3.10 IfV is anyG-measurable random variable, then providedIEjV:IE(XjG)j<1,
IEV:IE(XjG)]=IEV:X]: (3.3)
Proof: To see this, first use (3.2) and linearity of expectations to prove (3.3) whenV is a simple
G-measurable random variable, i.e.,V is of the formV =
P
n
k =1 c
k I
A
K
, where eachAk is inG and eachck is constant. Next consider the case thatV is a nonnegativeG-measurable random variable, but is not necessarily simple. Such a V can be written as the limit of an increasing sequence of simple random variables Vn; we write (3.3) for eachVn and then pass to the limit, using the Monotone Convergence Theorem (See Williams), to obtain (3.3) for V. Finally, the general G- measurable random variableV can be written as the difference of two nonnegative random-variables
V = V +
V , and since (3.3) holds forV+ andV it must hold forV as well. Williams calls this argument the “standard machine” (p. 56).
Based on this lemma, we can replace the second condition in the definition of a conditional expec- tation (Section 2.3.2) by:
(b’) For everyG-measurable random-variableV, we have
IEV:IE(XjG)]=IEV:X]: (3.4)
2.3.4 Properties of Conditional Expectation
Please see Willams p. 88. Proof sketches of some of the properties are provided below.
(a) IE(IE(XjG))=IE(X):
Proof: Just takeAin the partial averaging property to be.
The conditional expectation ofXis thus an unbiased estimator of the random variableX. (b) IfXisG-measurable, then
IE(XjG )=X :
Proof: The partial averaging property holds trivially whenY is replaced byX. And sinceX isG-measurable,X satisfies the requirement (a) of a conditional expectation as well.
If the information content ofGis sufficient to determineX, then the best estimate ofXbased onGisX itself.
(c) (Linearity)
IE(a
1 X
1 +a
2 X
2
jG)=a
1 IE(X
1 jG)+a
2 IE(X
2 jG):
(d) (Positivity) IfX 0almost surely, then
IE(XjG )0:
Proof: TakeA=f!2IE(XjG)(!)<0g. This set is inGsinceIE(XjG )isG-measurable.
Partial averaging implies
R
A
IE(XjG )dIP = R
A
XdIP. The right-hand side is greater than or equal to zero, and the left-hand side is strictly negative, unlessIP(A) = 0. Therefore,
IP(A)=0.
(h) (Jensen’s Inequality) If :R!Ris convex andIEj (X)j<1, then
IE( (X)jG ) (IE(XjG)):
Recall the usual Jensen’s Inequality:IE (X) (IE(X)):
(i) (Tower Property) IfHis a sub--algebra ofG, then
IE(IE(XjG )jH)=IE(XjH):
His a sub--algebra ofGmeans thatGcontains more information thanH. If we estimateX based on the information inG, and then estimate the estimator based on the smaller amount of information inH, then we get the same result as if we had estimatedXdirectly based on the information inH.
(j) (Taking out what is known) IfZisG-measurable, then
IE(ZXjG )=Z :IE(XjG):
When conditioning onG, theG-measurable random variableZacts like a constant.
Proof: LetZbe aG-measurable random variable. A random variableY isIE(ZXjG )if and only if
(a) Y isG-measurable;
(b)
R
A
YdIP = R
A
ZXdIP 8A2G.
TakeY = Z :IE(XjG). ThenY satisfies (a) (a product ofG-measurable random variables is
G-measurable). Y also satisfies property (b), as we can check below:
Z
A
YdIP = IE(I
A :Y)
= IEI
A
ZIE(XjG )]
= IEI
A
Z :X] ((b’) withV =IA Z
= Z
A
ZXdIP :
(k) (Role of Independence) IfHis independent of( (X)G), then
IE(Xj(GH))=IE(XjG ):
In particular, ifX is independent ofH, then
IE(XjH)=IE(X):
IfHis independent ofX andG, then nothing is gained by including the information content ofHin the estimation ofX.
2.3.5 Examples from the Binomial Model
Recall thatF1=f A
H A
T
g. Notice thatIE(S2 jF
1
)must be constant onAH andAT. Now sinceIE(S2
jF
1
)must satisfy the partial averaging property,
Z
A
H IE(S
2 jF
1 )dIP =
Z
A
H S
2 dIP
Z
A
T IE(S
2 jF
1 )dIP =
Z
A
T S
2 dIP :
We compute
Z
A
H IE(S
2 jF
1
)dIP = IP(A
H ):IE(S
2 jF
1 )(! )
= pIE(S
2 jF
1
)(!)8!2A
H :
On the other hand, Z
A
H S
2
dIP =p 2
u 2
S
0
+pqudS
0 :
Therefore,
IE(S
2 jF
1
)(!)=pu 2
S
0
+qudS
0
8!2A
H :
We can also write
IE(S
2 jF
1
)(!) = pu 2
S
0
+qudS
0
= (pu+qd)uS
0
= (pu+qd)S
1
(! )8!2A
H
Similarly,
IE(S
2 jF
1
)(! )=(pu+qd)S
1
(! )8!2A
T :
Thus in both cases we have
IE(S
2 jF
1
)(!)=(pu+qd)S
1
(!)8!2:
A similar argument one time step later shows that
IE(S
3 jF
2
)(!)=(pu+qd)S
2 (!):
We leave the verification of this equality as an exercise. We can verify the Tower Property, for instance, from the previous equations we have
IEIE(S
3 jF
2 )jF
1
] = IE(pu+qd)S
2 jF
2 ]
= (pu+qd)IE(S
2 jF
1
) (linearity)
= (pu+qd) 2
S
1 :
This final expression isIE(S3 jF
1 ).