Theorem 4.1 Theorem 4.1 The acceptance-rejection algorithm generates a random vari- able X such that
6.5 IMPORTANCE SAMPLING
110 VARlANCE REDUCT/ON TECHNlQUES
>
CCV.bar <- mean(CCV)
Var.CCV <- sum((CCV-CCV.bar>-2>/(N2-1)
SE <- sqrt(Var.CCV)
CI <- CCV.bar-1.96*SE/sqrt(N2)
CI [2] <- CCV. bar+l .96*SE/sqrt (N2)
CCV
.
barCI
For N I = 500 and N2 = 50,000, we have a 95% confidence interval for CCV of j1.0023 1.02471. In this case, the estimated call price is 1.0135 with standard error 0.0057.
In using control variates, there are a number of features that should be kept in mind.
0 What should constitute the appropriate control? We have seen that in simple cases, the underlying asset prices may be appropriate. In more complicated situation, we may use some easily computed quantities that are highly correlated with the object of interest as control variates. For example, standard calls and puts frequently provide convenient source of control variates for pricing exotic options, and so does the underlying asset itself.
0 The control variate estimator is usually unbiased by construction. Also, we can separate the estimation of the coefficients (2;) from the estima- tion of prices.
0 The flexibility of choosing the ~s suggests that we can sometimes make optimal use of information. In any event, we should exploit the specific feature of the problem under consideration, rather than generic appli- cations of routine methods.
0 Because of its close relationship with linear regression, control variates are easily computed and explained.
0 We have only covered linear control. In practice, one can consider using nonlinear control variates, for example, X U / p y . Statistical inference for nonlinear control may be tricky though.
IMPORTANCE SAMPLING 111
then concludes with examples illustrating the different methods of variance reduction in risk management.
Suppose that we are interested in estimating 6 = E [ h ( X ) ] =
/
h ( z ) f ( z ) d s ,where X =
(XI,. . . ,
X n ) denotes an n-dimensional random vector having a joint p.d.f. f(z) = f(z1,.. .
,zn). Suppose that a direct simulation of the random vector X is inefficient so that computing h ( z ) is infeasible. This inefficiency may be due to difficulties encountered in simulating X , or the variance of h ( z ) being too large, or a combination of both.Suppose there exists another density g ( x ) , which is easy t o simulate and satisfies the condition that f(z) = 0 whenever g ( s ) = 0. Then 6 can be estimated by
6 = E[h(z)]
h(x)f ( X I
=
[
dx)I'
where the notation E, denotes the expectation of the random vector X taken under the density g, i.e., X has joint p.d.f. g ( s ) . It follows from this identity that 6 can be estimated by generating X with density g and then using as the estimator the average of the values of h ( X ) f ( X ) / g ( X ) . In other words, we could construct a Monte Carlo estimator of 6 = E ( h ( X ) ) by first computing i.i.d. random vectors X i with p.d.f. g ( X ) , then using the estimator
1 h ( X i ) f ( X i ) n i= 1 g ( X i ) '
If adensity g ( s ) can be chosen so that the random variable h ( X ) f ( X ) / g ( X ) has a small variance, then this approach is known as the importance sampling approach and can result in an efficient estimator of 6.
To see how it works, note that the ratio f ( X ) / g ( X ) represents the like- lihood ratio of obtaining X with respective densities f and g . If X is dis- tributed according to g , then f ( X ) would be small relative to g ( X ) and there- fore when X is simulated according t o g, the likelihood ratio f ( X ) / g ( X ) will usually be small in comparison to 1. On the other hand, it can be seen that
Thus, even though the likelihood ratio f ( X ) / g ( X ) is smaller than 1, its mean is equal to 1, suggesting that it occasionally takes large values and results in a large variance.
112 VARIANCE REDUCTION TECHNIQUES
To make the variance of h ( X )
f
( X ) / g ( X ) small, we arrange for a density g such that those values of X for which f ( X ) / g ( X ) is large are precisely the values for which h ( X ) is small, thus making the ratio h ( X ) f ( X ) / g ( X ) stays small. Since importance sampling requires h t o be small sometimes, it works best when estimating a small probability. Further discussions on importance sampling and likelihood method are given in Glasserman (2003).Example 6.10 Consider the problem 0 = E(U5)).
Suppose that we use the standard method 0 = U:, then we oversample the data near the origin and undersample the data near 1. It is easy to compute that
A 1 1 1 1 0.0631
Var(0) = -{EU1' - (EU5),} = -(- - -) =
-.
n n 11 36 n
Now, suppose we use the importance sampling, putting more weights near 1.
Let g(x) = 5x4 for 0
<
3:<
1. Thenx 5 . 1 E,X 61 = Eg(-) 5 x 4 =
-.
5The variance of this method is 1
Var(&) = -{E,X2 25n - (E,X),}
3:2(53:4) d3: - (
I'
3:(5x4) d.),) 25n25n 7
1 5 5 ,
- - - { - - $ )
1
0.00794
- n
- 7
resulting a variance reduction of 98.74%.
cl
How do we choose g in general? This requires the notion of the so-called tilted density. Recall the notation that M ( t ) = E(etx) represents the moment generating function (m.g.f.) of the random variable X with density f . Definition 6.2 A density function
is called a tilted density of a given f, -CO
<
t<
CO.Note that from this definition, a random variable with density ft tends to be larger than the one with density
f
when t>
0, and tends to be smaller when t < 0.IMPORTANCE SAMPLING 113
Example 6.11 Let f be a Bernoulli density with parameter p. Then f ( x ) = p"(l-p)'-", x = 0 , l . In this case, the m.g.f. is M ( t ) = E ( e t x ) = pet+(l-p)
so that
Thus, the tilted density ft is a Bernoulli density with parameterpt = pet/(pet+
1 - P I . 17
In many instances, we are interested in sums of independent random vari- ables. In these cases, the joint density f (2) of z = ( X I , .
. . ,
x,) can be written as the product of the marginals f i of xi so thatf(z)
= f l ( X 1 ) .. .
f n ( % ) .In this situation, it is often useful to generate the X i according to their tilted densities with a common t.
Example 6.12 Let
XI,. . . ,
X , be independent with marginal densities f i . Suppose we are interested in estimating the quantity6 = P(S 2 a ) ,
where S =
E:=,
X i and a>
E ( X i ) is a given constant. W e can apply tilted densities to estimate 6. Let I { S 2 U} equal 1 i f S 2 a and U otherwise.Then
6 = E(I{S 2 a } ) ,
where the expectation is taken with respect to the joint density. Suppose we simulate X i according to the tilted density function f t , i , where the value of t
>
0 is to be specified. To construct the importance sampling estimator, note that h ( X ) = I { S 2 a } , f ( X ) =n
f i ( X i ) , and g ( X ) = H f t , i ( X i ) . The importance sampling estimator would bee
= I { S 2 U } H - . . f i ( X i )i f t ,i ( X i ) NOW f i ( X i ) / ft,i(Xz) = Mi(t)e-txa, therefore,
114 VARIANCE REDUCTION TECHNIQUES
Since at is assumed that t
>
0 , S2
a iff e-ts 5 e-ta and I { S 25
ePta,6 5
M(t)e-ta.We now find t
>
0 such that the right-hand side of the above inequality is minimized. I n that case, we obtain an estimator that lies between 0 and mint M(t)e-ta. It can be shown that such t can be found by solving the equa- tionE t ( S ) = a.
ABer solving for t, it can be utilized in the simulation. To be specific, suppose
XI,. .
,,X,
are i.i.d. Bernoulli trials with p = pi = 0.4. Let n = 20 and a = 16. Thenso that
6
= I { S2
n ( p e t+
1 - p ) .i
Recall from the preceding example that the tilted density ft,i is the p.d.6 of a Bernoulli trial with parameter p* = pet/(pet
+
1 - p ) . It follows that20 Pet pet
+
1 - p'Et(S) = 20p* = i= 1
Plugging in n = 20,p = 0.4, a = 16, we have 0.4et 0.4et
+
0.620 = 16,
which leads to et* = 6. Therefore, we should generate Bernoulli trials with parameter 0.4et'/(0.4et*
+
0.6) = 0.8 as the g and evaluate M ( t * ) = (0.4et'+
0.6)20 and e-t's = (116)'. The importance sampling estimator is now
8
= I { S 2 1 6 ) A ~ f ( t * ) e - ~ * ~ = I { S 2 16}320(1/6)s Furthermore, we know that6
5 M(t*)e-t*a = 320(1/6)16 = 0.001236.Thus, in each iteration, the value of the importance sampling estimator lies between 0 and 0.001236.
O n the other hand, we can also evaluate 0 = P(S 2 16) exactly, which equals to the probability that a Binomial random variable with parameters 20 and 0.4 be at least as big as 16. This value turns out to be 0.000317. Recall the function h ( X ) = I { S 2 16). This is a Bernoulli trial with parameter 6 = 0.000317. Therefore, if we simulate directly from X s , the standard estimator OS has variance
var(Bs)
=q i
-e)
= 3.169 x 10- 4.
IMPORTANCE SAMPLING 115
As 0 5
6
5 0.001236, it can be shown thatVar(i)
5
(0.001236)2/4 = 3.819 x 10-7,which is much smaller than the variance of the standard estimator
6,.
0Another application of importance sampling is to estimate tail probabilities (recall at the beginning we mentioned that importance sampling works best in small probability). Suppose we are interested in estimating P ( X
>
a), where X has p.d.f. f and a is a given constant. Let I ( X>
a) = 1 if X>
a and 0 otherwise. ThenP ( X > a ) = E f ( l ( X > a ) )
Take g(x) = Ae-Xx, x
>
0, an exponential density with parameter A. Then the above derivation showsP ( X
>
a ) = E,[eXXf
( X ) l X>
a]e-’”/A.Using the so-called “memoryless property”
,
i.e., P ( X>
s+tlX>
s) = P(X>
t ) , of an exponential distribution, it can be easily seen that the conditional distribution of an exponential distribution conditioned on { X
>
a } has the same distribution as a+
X . Therefore,e-Aa
P ( X
>
a ) = -E,[eX(X+a) A f ( X+
a ) ]= xE,[eXX 1
f
( X f a)].We can now estimate Q by generating X I , .
. . ,
X , according to an exponential distribution with parameter X and using..
1 1Q = An _ _
ce””. f
( X i+
a).i = l
Example 6.13 Suppose we are interested in Q = P ( X
>
a ) , where X is standard normal. Then f i s the normal density. Let g be an exponential116 VARIANCE REDUCTION TECHNIQUES
density with X = a. Then
P ( X
>
a ) = -Eg[eax a 1 f ( X+
a ) ]W e can therefore estimate 0 by generating X , an exponential distribution with rate a , and then using
to estimate 0. To compute the variance of
6,
we need to compute quantities These can be computed numerically and can be shown to be and E,[e-XZ].Eg[e-X2/2] = aeaz/2&(1 - @(U)), Eg[e-xz] = ~ e ~ ’ / ~ f i ( l - @(a/&)).
For example, if a = 3 and n = 1, then Var(e-x2/2) = 0.0201 and Var(9) =
(s)’
x 0.0201-
4.38 x 1OP8. On the other hand, a standard estimator has 0 Consider simulating a vanilla European call option price again, using the importance sampling technique. Suppose that we evaluate the value of a deep out-of-money (So<<
K ) European call option with a short maturity T . Many sampling paths result ST 5 K and give zero-values. Thus, these samples are wasted. One possible way to deal with this problem is t o increase the values of Zis by sampling them from a distribution with large mean and large variance. SampleZi
from N( 2, s2) so thata&Zi
-
N(m, a2Ts2).vanance O(1 - 0) = 0.00134.
U f i
Note that
Zi
can be written as2.
$-a%@ -m +
s Z ~ , 2,-
N(0,l).The importance sampling estimator is then given by 1 N
c1
= e - T T - N ma~{SOe(T-ff212)T+o~” - K , O)R(Zz),i= 1
where
IMPORTANCE SAMPLING 117
Thus, CI can be expressed as
Example 6.14 Let
SO
= 100, K = 140, r = 0.05, 0 = 0.3, and T = 1.We simulate the value of this deep out-of-money European call option, using the importance sampling technique and compare it with the result of standard method.
The SPLUS code is as follows:
######## Part (1) : Standard method ########
N <- 10000
so <- 100
K <- 140
t <- 1
r <- 0.05
sigma <- 0.3
nu <- r - sigma-2/2
Ci <- rep(0,N)
for (i in 1:N) z <-
c
rnorm(1)ST <- SO*exp(nu*t+sigma*sqrt(t)*z)
Ci [i] <- exp(-r*t)*max(ST-K,O) C.bar 1 <- mean(Ci)
SE <- sqrt (var (Ci) / (N-1) )
C .bar SE
######## Part(2): Importance Sampling ########
N <- 10000
so
<- 100K <- 140
t <- 1
r <- 0.05
sigma <- 0.3
nu <- r - sigma-2/2
m <- 0.5
s <- 1.1
Ci <- rep(0,N)
for (i in 1:M)
118
zl
c
<-22 <- ST <- Ci Cil