Dynamic random Weyl sampling for drastic reduction of randomness in Monte Carlo integration
Hiroshi Sugita
Faculty of Mathematics, Graduate School of Mathematics, Kyushu University, 6-10-1 Hakozaki, Higashi-ku, 812-8581 Fukuoka, Japan
Abstract
To reduce randomness drastically in Monte Carlo (MC) integration, we propose a pairwise independent sampling, the dynamic random Weyl sampling (DRWS). DRWS is applicable even if the length of random bits to generate a sample may vary. The algorithm of DRWS is so simple that it works very fast, even though the pseudo-random gen- erator, the source of randomness, might be slow. In particular, we can use a cryptographically secure pseudo-random generator for DRWS to obtain the most reliable numerical integration method for complicated functions.
© 2002 IMACS. Published by Elsevier Science B.V. All rights reserved.
Keywords: Numerical integration; Monte Carlo integration; i.i.d.-sampling; Pairwise independent sampling; Random Weyl sampling; Dynamic random Weyl sampling; Cryptographically secure pseudo-random generator
1. Introduction
By [7,9], the random Weyl sampling (RWS) was introduced to reduce randomness in Monte Carlo (MC) integration by utilizing pairwise independent samples instead of i.i.d. samples.
The reduction of randomness by RWS is quite drastic. For example, let a random variable W be a function of 500 tosses of a coin. To integrateW numerically, i.i.d.-sampling with 107samples requires 500×107 =5×109random bits, while RWS with the same sample size requires only
500+log2107
× 2=1048 random bits (Example 2 of[7]). In addition, both sampling methods have a same mean square error.
In practice, we use a pseudo-random generator as the source of randomness for RWS. Then, the drastic reduction of randomness has the following advantages:
(a) RWS is very insensitive to the quality of pseudo-random numbers.
(b) With almost no slowdown of generating speed of samples, we can use a slow but precise pseudo-random generator, such as a cryptographically secure one ([4],Remark 1), to get most reliable results.
E-mail address: [email protected] (H. Sugita).
0378-4754/02/$ – see front matter © 2002 IMACS. Published by Elsevier Science B.V. All rights reserved.
PII: S 0 3 7 8 - 4 7 5 4 ( 0 2 ) 0 0 2 3 1 - 8
By the advantage (a), we might be careless to some extent in choosing a pseudo-random generator (Section 4of[9]). But, of course, much more remarkable is the advantage (b).
However, there still remained a problem. That is, RWS introduced by[7,9]assumed that the length of random bits to generate each sample is fixed. As a matter of fact, in many cases, the length of random bits to generate a sample may vary by circumstances. For example, it is the case when we use von Neumann’s rejection method ([2,5]).
In the present paper, we propose a pairwise independent sampling that is applicable even if the length of random bits to generate a sample may vary. We call it the dynamic random Weyl sampling (DRWS, [8]). The algorithm of DRWS is so simple that the speed of generating pairwise independent samples is very fast, even faster than i.i.d.-sampling. In addition, the translation of each i.i.d.-sampling’s code into DRWS is very easy (see main routine of DRWS). In a word, DRWS is applicable whenever i.i.d.-sampling is applicable, and it is faster and much more reliable than i.i.d.-sampling.
We can say that (D)RWS is a kind of pseudo-random generator in the sense that it lets little randomness look big randomness (Remark 1,[4]). But its usage is limited: only for numerical integration.
Remark 1. A functionfn :{0,1}n→ {0,1}(n)is said to be a pseudo-random generator, ifn < (n). We considerfn(Zn)∈ {0,1}(n)to be the pseudo-random bits (numbers), whereZn, a uniformly distributed nbit random variable, is called the seed.fnis said to be cryptographically secure if we cannot distinguish the distributions offn(Zn)andZ(n) by any feasible statistical test ([4]). If we use a cryptographically secure pseudo-random generatorfnin MC computation, we can say the following: LetY =φ(Z(n))be the random variable under consideration, whereφ should be feasible. Suppose we useY:=φ(fn(Zn)) instead ofY in practice. Then the distributions ofY andYcannot be distinguished by any feasible test.
(If they could be, thenfn would not be cryptographically secure!) This means thatY can completely pretend to beY in practical MC computation.
2. Algorithm
Let us first introduce how to implement DRWS. In the sequel, we use the following notations to describe the precision of computation: Form∈N,
Dm:= i
2m; i=0, . . . ,2m−1
(1) xm := 2mx
2m ∈Dm, x ∈[0,1). (2)
Let W be a random variable that we wish to integrate numerically. We realize W as a functional of [0,1)-valued uniformly distributed i.i.d. random variables{X1, X2, . . .}. Let the precision of the real- ization of eachXl in a computer be 2−K, that is, we considerXl ∈ DK. Now we assume the following condition, which is obviously indispensable forW to be treated in MC computations.
Assumption 1. We need only a finite number ofX1, . . . , XL to compute W almost surely. Here the numberLofXlmay vary by circumstances, but for eachl∈N, whenX1, . . . , Xl are given, we should be able to judge whether or not the followingXl+1is needed to computeW without knowing anything aboutXl, l≥l+1.
We next suppose that a virtual function
• function Randomm:Dm-valued;
returns aDm-valued uniformly distributed random variable which is independent of any formerly gener- ated random variables. (In practice, this function should be simulated by a pseudo-random generator.)
LetN denote the total number of samples ofW for numerical integration.
2.1. Algorithm of i.i.d.-sampling
• Main routine
function Mean of W:Real;
begin S:=0.0;
Fori :=1 toN do begin
X :=RandomK;
while (anotherXis needed to computeW) doX:=RandomK; S :=S+W;
end;
result:=S/N; end;
This is the algorithm of i.i.d.-sampling. The functionMean of Wreturns the value that is substituted for the variableresult, namely,S/N. Here,Xlwhich are needed to computeW are all generated by the random functionRandomK.
2.2. Algorithm of DRWS
Next, we introduce the algorithm of DRWS. Assume that K ≥K+
log2N
. (3)
• Global variables
l : integer;
{xl, αl}l : array(variable length)of(DK)2-valued vectors;
• Functions
function First RWS :DK-valued;
begin l:=0;
result:=Next RWS;
end;
function Next RWS:DK-valued;
begin l:=l+1;
ifxl andαl have not been generated then begin
xl:=RandomK; αl :=RandomK; end;
xl :=xl +αl (mod 1);
result:= xlK; end;
• Main routine
function Mean of W : Real;
begin S:=0.0;
Fori:=1 toN do begin
X:=First RWS;
while (anotherXis needed to computeW) doX :=Next RWS;
S:=S+W; end;
result:=S/N; end;
The main routine is very similar to that of i.i.d.-sampling. The only difference is the following: in DRWS,Xlare not generated by the direct calls ofRandomK, but they are generated byFirst RWSand Next RWS. The former is called to generate the firstX1only, and the latter is called to generate the rest X2, X3, . . ., if necessary.
The random functionRandomK is called only byNext RWS, only whenxlandαlhave not yet been generated, to generate them. Thus, DRWS requires much less randomness than i.i.d.-sampling. Note that the random function used in DRWS here isRandomK, where the additionalK−K(≥
log2N
) bits are required to implement the Weyl transformation with precision 2−K forN times exactly (seeLemma 1).
According toTheorem 1, the samples of DRWS is pairwise independent, and hence, it has the same mean square error as i.i.d.-sampling (Corollary 1). However, the fact that DRWS requires very little randomness assures the practical advantages (a) and (b) listed inSection 1.
The author personally uses the pseudo-random generator introduced by[6], which is a slow but precise generator, as the source of randomness for DRWS ([8]). Nevertheless DRWS works very fast, because in most cases, the functionNext RWSimplements “xl := xl +αl (mod 1)”, which can be done very quickly, rather than generates a new pair (xl, αl)by the slow pseudo-random generator.
Remark 2. In some large scale computations, that is, when the probability thatW requires too manyXl
is not negligible, DRWS may exhaust memories of a computer to keep all of(xl, αl)that are currently generated. So, we had better always check how many memories are currently used for(xl, αl).
3. Mathematical formulation
In this section, we describe the mathematical structure of DRWS. LetT1be a one-dimensional torus, which is an additive group [0,1)with additionx+y(mod 1). LetBbe the set of all Borel subsets ofT1. We define an increasing sequence of subσ-fields{Bm}mofBby
Bm:=σ{[a, b);a, b∈Dm, a < b} =σ(d1, . . . , dm), m∈N.
Here, the functiondi(x)denotes theith digit ofx ∈T1=[0,1)in its dyadic expansion, that is, d1(x):=1[1/2,1)(x), di(x):=d1(2i−1x), x ∈T1.
A functionτ :T1→N∪ {∞}is called a{Bm}m-stopping time ([1,2]), if {τ ≤m}:= {x ∈T1;τ(x)≤m} ∈Bm, ∀m∈N.
For a{Bm}m-stopping timeτ, we define a subσ-fieldBτ ofBby Bτ := {A∈B;A∩ {τ ≤m} ∈Bm,∀m∈N}.
Lp(T1,Bτ,dx)is simply denoted byLp(Bτ). A functionf :T1→R∪ {±∞}isBτ-measurable if and only if
f (x)=f (xτ(x)), x ∈T1. (4)
Let us see the relation between the algorithm in Section 2and the above mathematical formulation.
Again, let the precision of computation be 2−K. SetXl(x):=K
j=12−jd(l−1)K+j(x),l∈N. Then{Xl}∞l=1
isDK-valued uniformly distributed independent random variables on the probability space(T1,B,dx). SupposeW is a functional of{Xl}∞l=1, which is expressed as a function ofx ∈T1in the following way:
W =f (x)=f ∞
i=1
2−idi(x)
=f ∞
l=1
2−(l−1)KXl(x)
.
Now, ifW satisfiesAssumption 1, defining a{Bm}m-stopping timeτ by
τ(x):=inf{lK;W =f (x)is calculated fromX1(x), . . . , Xl(x)}, x∈T1,
we see thatf isBτ-measurable. As was seen inSection 2, however, we need not be explicitly aware of τ orf in practice. They are only needed for the mathematical analysis below.
LetN ∈Nbe the total number of samples ofW and let (xl, αl)∈DK+log2N ×DK+log2N ⊂T1×T1, l∈N, be uniformly distributed independent random variables. Set
xn := ∞
l=1
2−(l−1)K xl+νn,lαl
K ∈T1, n=1, . . . , N, (5)
whereνn,l is a random variable defined by1 νn,l :=
n, ifl=1,
#{1≤u≤n;τ(xu(l−1)K) > (l−1)K}, ifl >1. (6) Sinceτ is a{Bm}m-stopping time, it is easy to see thatνn,l andxnare well-defined.
Now, we have the following theorem.
Theorem 1. If f is Bτ-measurable, the random variables {f (xn)}Nn=1 are pairwise independent and identically distributed. Their common distribution coincides with that of f considered as a random variable defined on(T1,B,dx).
Note that{xn}Nn=1are all uniformly distributed but are not pairwise independent (Lemma 2).Theorem 1 asserts that if they are composed with aBτ-measurablef, then{f (xn)}Nn=1become pairwise independent.
Theorem 1and the strong law of large numbers for identically distributed pairwise independent random variables ([3]) imply that iff ∈L1(Bτ)andN is sufficiently large, we have
1 N
N n=1
f (xn)∼E[f]. (7)
Thus, we can obtain an approximated value of E[f] by (7). In particular, if E[τ] < ∞, we have τ ∈L1(Bτ), and hence, puttingf =τ in(7), we know that the whole computation ends in finite time of order O(N).
Remark 3. IfE[τ] = ∞, the total computation time would be impractically large forN 1. Conse- quently, in practice, we have to be careful whetherE[τ]= ∞or not. For instance, ifτis a stopping time associated with von Neumann’s rejection method, then it satisfiesE[τ] < ∞, because its distribution should be a geometric distribution. Of course, this remark applies to both i.i.d.-sampling and DRWS.
Definition 1. Suppose thatE[τ]<∞andf ∈L1(Bτ). The sampling method by means of{xn}Nn=1, that is, the method to approximateE[f] by
1 N
N
n=1
f (xn), 1≤N≤N,
is called the dynamic random Weyl sampling (DRWS).
The term “dynamic” indicates that the sampling points{xn}Nn=1vary according with the integrandf, more precisely, with the stopping timeτ.
Corollary 1. Iff ∈ L2(Bτ), DRWS has the same mean square error as i.i.d.-sampling, that is,Var[f] being the variance off, we have
1 For a setA, #Adenotes the number of its elements.
E
1 N
N
n=1
f (xn)−E[f]
2
= Var[f]
N , 1≤N≤N. (8)
4. Proof of theorem
First, we quote Example 2 of[7]as a lemma:
Lemma 1. Let(x, α)∈DK+log2N ×DK+log2Nbe a uniformly distributed random variable. Then for any pair of integers 1≤p < p≤N,x+pαKand x+pα
Kare both uniformly distributed inDK and they are independent.
From now on, we fix 1 ≤ n < n ≤ N. Letm(n, n) :=max{l;νn,l < νn,l}. Then, sinceνn,l < νn,l
impliesνn,i< νn,i for alli=1, . . . , l, we have
m(n, n)=max{l; νn,i < νn,i, i =1, . . . , l}. (9)
Finally, set
˜ xn :=
∞ l=1
2−(l−1)K xl+ ˜νn,lαl
K, ν˜n,l :=
νn,l ifl≤m(n, n),
n ifl > m(n, n). (10)
Lemma 2.
(i) Eachxn is uniformly distributed inT1=[0,1). (ii) xnandx˜n are independent.
Proof. To prove both (i) and (ii), it is enough to show that for anyM∈N, the random variables xl+νn,lαl
K, xl + ˜νn,lαl
K, l=1, . . . , M, (11)
are all uniformly distributed inDK and they are independent.
Note that ifl≥2, bothνn,l andν˜n,ldepend on(x1, α1), . . ., (xl−1, αl−1)but they are independent of (xl, αl). Note also that we always haveνn,l <ν˜n,l.
LetP be the probability measure that governs all(xl, αl). Then, for anys1, t1, . . . , sM, tM ∈DK, we see P ( xl+νn,lαl
K < sl, xl+ ˜νn,lαl
K < tl, l=1, . . . , M)
=
pl<pl,l=1,... ,M
P
xl +plαlK < sl, νn,l =pl, xl +plαl
K < tl, ν˜n,l=pl, l=1, . . . , M
=
pl<pl,l=1,... ,M
P
xM +pMαMK < sM
xM +pMαM
K < tM
×P
xl+plαlK < sl, νn,l =pl, xl+plαl
K < tl, ν˜n,l =pl, l =1, . . . , M−1, νn,M =pM
˜
νn,M =pM
.
SincepM =pM,Lemma 1implies that the events{xM +pMαMK < sM}and{ xM +pMαM
K < tM}
are independent, so we have
=
pl<pl,l=1,... ,M
P (xM +pMαMK < sM)P ( xM +pMαM
K < tM)
×P
xl+plαlK < sl, νn,l =pl, xl+plαl
K < tl, ν˜n,l =pl,l=1, . . . , M−1, νn,M =pM
˜
νn,M =pM
=sMtM
pl<pl,l=1,... ,M−1
P
xl+plαlK < sl, νn,l =pl, xl+plαl
K < tl, ν˜n,l =pl, l=1, . . . , M−1
=sMtMP ( xl+νn,lαl
K < sl, xl + ˜νn,lαl
K < tl, l =1, . . . , M−1).
Repeating this procedure, we eventually obtain P ( xl+νn,lαl
K < sl, xl+ ˜νn,lαl
K < tl, l =1, . . . , M)= M i=1
siti,
which completes the proof. 䊐
Proof of Theorem 1. First, byLemma 2(i),f (xn)andf are identically distributed.
Next, by(5)withnsubstituted fornand(10), we have
xnm(n,n)K = ˜xnm(n,n)K. (12)
Lets := τ(xn)/K. Then we haveτ(xn) > (s−1)K, and henceνn,s< νn,s. Consequently, it follows from(9)that
s ≤m(n, n). (13)
(12)and(13)imply
xnsK = ˜xnsK. (14)
On the other hand, sinceτ(xn)≤sK andτ is a{Bm}m-stopping time, the value ofτ(xn)is determined byxnsK. Namely, we seeτ(xn)=τ(xnsK).
Then by(14), we must have
τ(x˜n)=τ(xn)≤sK. (15)
Consequently,(14)and(15)implies
xnτ(xn)= ˜xnτ(˜xn). (16) Sincef isBτ-measurable,(4)and(16)imply thatf (xn)=f (x˜n). Finally, fromLemma 2(ii) it follows
thatf (xn)andf (xn)are independent. 䊐
Acknowledgements
Partially supported by Grant-in-Aid for scientific research 11440034 Ministry of Education, Japan.
References
[1] P. Billingsley, Probability and Measure, 3rd ed., Wiley, New York, 1995.
[2] N. Bouleau, D. Lépingle, Numerical Methods for Stochastic Processes, Wiley, New York, 1994.
[3] N. Etemadi, An elementary proof of the strong law of large numbers, Z. Warhsch. Verw. Gebiete 55-1 (1981) 119–122.
[4] M. Luby, Pseudorandomness and Cryptographic Applications, Princeton University Press, Princeton, NJ, 1996.
[5] J. von Neumann, Various techniques used in connection with random digits, US Natl. Bureau Stand. Appl. Math. Ser. 12 (1951) 36–38.
[6] H. Sugita, Pseudo-random number generator by means of irrational rotation, Monte Carlo Methods Appl., VSP 1-1 (1995) 35–57.
[7] H. Sugita, Robust numerical integration and pairwise independent random variables, J. Comput. Appl. Math. 139 (2002) 1–8.
[8] H. Sugita, The Random Sampler. A C-language library for pseudo-random generation and dynamic random Weyl sampling, available athttp://idisk.mac.com/hiroshi sugita/Public/imath/mathematics.html.
[9] H. Sugita, S. Takanobu, Random Weyl sampling for robust numerical integration of complicated functions, Monte Carlo Methods Appl., VSP 6-1 (1999) 27–48.