Dynamic random Weyl sampling for drastic reduction of randomness in Monte Carlo integration

(1)

Dynamic random Weyl sampling for drastic reduction of randomness in Monte Carlo integration

Hiroshi Sugita

Faculty of Mathematics, Graduate School of Mathematics, Kyushu University, 6-10-1 Hakozaki, Higashi-ku, 812-8581 Fukuoka, Japan

Abstract

To reduce randomness drastically in Monte Carlo (MC) integration, we propose a pairwise independent sampling, the dynamic random Weyl sampling (DRWS). DRWS is applicable even if the length of random bits to generate a sample may vary. The algorithm of DRWS is so simple that it works very fast, even though the pseudo-random generator, the source of randomness, might be slow. In particular, we can use a cryptographically secure pseudo-random generator for DRWS to obtain the most reliable numerical integration method for complicated functions.

Keywords: Numerical integration; Monte Carlo integration; i.i.d.-sampling; Pairwise independent sampling; Random Weyl sampling; Dynamic random Weyl sampling; Cryptographically secure pseudo-random generator

1. Introduction

By [7,9], the random Weyl sampling (RWS) was introduced to reduce randomness in Monte Carlo (MC) integration by utilizing pairwise independent samples instead of i.i.d. samples.

The reduction of randomness by RWS is quite drastic. For example, let a random variable W be a function of 500 tosses of a coin. To integrateW numerically, i.i.d.-sampling with 10⁷samples requires 500×10⁷ =5×10⁹random bits, while RWS with the same sample size requires only

500+log₂10⁷

× 2=1048 random bits (Example 2 of[7]). In addition, both sampling methods have a same mean square error.

In practice, we use a pseudo-random generator as the source of randomness for RWS. Then, the drastic reduction of randomness has the following advantages:

(a) RWS is very insensitive to the quality of pseudo-random numbers.

(b) With almost no slowdown of generating speed of samples, we can use a slow but precise pseudo-random generator, such as a cryptographically secure one ([4],Remark 1), to get most reliable results.

E-mail address: [email protected] (H. Sugita).

PII: S 0 3 7 8 - 4 7 5 4 ( 0 2 ) 0 0 2 3 1 - 8

(2)

By the advantage (a), we might be careless to some extent in choosing a pseudo-random generator (Section 4of[9]). But, of course, much more remarkable is the advantage (b).

However, there still remained a problem. That is, RWS introduced by[7,9]assumed that the length of random bits to generate each sample is fixed. As a matter of fact, in many cases, the length of random bits to generate a sample may vary by circumstances. For example, it is the case when we use von Neumann’s rejection method ([2,5]).

In the present paper, we propose a pairwise independent sampling that is applicable even if the length of random bits to generate a sample may vary. We call it the dynamic random Weyl sampling (DRWS, [8]). The algorithm of DRWS is so simple that the speed of generating pairwise independent samples is very fast, even faster than i.i.d.-sampling. In addition, the translation of each i.i.d.-sampling’s code into DRWS is very easy (see main routine of DRWS). In a word, DRWS is applicable whenever i.i.d.-sampling is applicable, and it is faster and much more reliable than i.i.d.-sampling.

We can say that (D)RWS is a kind of pseudo-random generator in the sense that it lets little randomness look big randomness (Remark 1,[4]). But its usage is limited: only for numerical integration.

Remark 1. A functionf_n :{0,1}ⁿ→ {0,1}⁽ⁿ⁾is said to be a pseudo-random generator, ifn < (n). We considerfn(Zn)∈ {0,1}⁽ⁿ⁾to be the pseudo-random bits (numbers), whereZn, a uniformly distributed nbit random variable, is called the seed.fnis said to be cryptographically secure if we cannot distinguish the distributions offn(Zn)andZ(n) by any feasible statistical test ([4]). If we use a cryptographically secure pseudo-random generatorfnin MC computation, we can say the following: LetY =φ(Z(n))be the random variable under consideration, whereφ should be feasible. Suppose we useY:=φ(fn(Zn)) instead ofY in practice. Then the distributions ofY andYcannot be distinguished by any feasible test.

(If they could be, thenf_n would not be cryptographically secure!) This means thatY can completely pretend to beY in practical MC computation.

2. Algorithm

Let us first introduce how to implement DRWS. In the sequel, we use the following notations to describe the precision of computation: Form∈N,

Dm:= i

2^m; i=0, . . . ,2^m−1

(1) x_m := 2^mx

2^m ∈Dm, x ∈[0,1). (2)

Let W be a random variable that we wish to integrate numerically. We realize W as a functional of [0,1)-valued uniformly distributed i.i.d. random variables{X1, X2, . . .}. Let the precision of the real- ization of eachX_l in a computer be 2^−K, that is, we considerX_l ∈ D_K. Now we assume the following condition, which is obviously indispensable forW to be treated in MC computations.

Assumption 1. We need only a finite number ofX1, . . . , XL to compute W almost surely. Here the numberLofXlmay vary by circumstances, but for eachl∈N, whenX1, . . . , Xl are given, we should be able to judge whether or not the followingX_l+1is needed to computeW without knowing anything aboutXl, l≥l+1.

(3)

We next suppose that a virtual function

• function Random_m:Dm-valued;

returns aD_m-valued uniformly distributed random variable which is independent of any formerly generated random variables. (In practice, this function should be simulated by a pseudo-random generator.)

LetN denote the total number of samples ofW for numerical integration.

2.1. Algorithm of i.i.d.-sampling

• Main routine

function Mean of W:Real;

begin S:=0.0;

Fori :=1 toN do begin

X :=Random_K;

while (anotherXis needed to computeW) doX:=Random_K; S :=S+W;

end;

result:=S/N; end;

This is the algorithm of i.i.d.-sampling. The functionMean of Wreturns the value that is substituted for the variableresult, namely,S/N. Here,Xlwhich are needed to computeW are all generated by the random functionRandom_K.

2.2. Algorithm of DRWS

Next, we introduce the algorithm of DRWS. Assume that K ≥K+

log₂N

. (3)

• Global variables

l : integer;

{xl, αl}l : array(variable length)of(DK)²-valued vectors;

• Functions

function First RWS :DK-valued;

begin l:=0;

result:=Next RWS;

end;

(4)

function Next RWS:DK-valued;

begin l:=l+1;

ifx_l andα_l have not been generated then begin

xl:=Random_K; αl :=Random_K; end;

xl :=xl +αl (mod 1);

result:= xl_K; end;

• Main routine

function Mean of W : Real;

begin S:=0.0;

Fori:=1 toN do begin

X:=First RWS;

while (anotherXis needed to computeW) doX :=Next RWS;

S:=S+W; end;

result:=S/N; end;

The main routine is very similar to that of i.i.d.-sampling. The only difference is the following: in DRWS,Xlare not generated by the direct calls ofRandom_K, but they are generated byFirst RWSand Next RWS. The former is called to generate the firstX1only, and the latter is called to generate the rest X2, X3, . . ., if necessary.

The random functionRandom_K is called only byNext RWS, only whenxlandαlhave not yet been generated, to generate them. Thus, DRWS requires much less randomness than i.i.d.-sampling. Note that the random function used in DRWS here isRandom_K, where the additionalK−K(≥

log₂N

) bits are required to implement the Weyl transformation with precision 2^−K forN times exactly (seeLemma 1).

According toTheorem 1, the samples of DRWS is pairwise independent, and hence, it has the same mean square error as i.i.d.-sampling (Corollary 1). However, the fact that DRWS requires very little randomness assures the practical advantages (a) and (b) listed inSection 1.

The author personally uses the pseudo-random generator introduced by[6], which is a slow but precise generator, as the source of randomness for DRWS ([8]). Nevertheless DRWS works very fast, because in most cases, the functionNext RWSimplements “xl := xl +αl (mod 1)”, which can be done very quickly, rather than generates a new pair (xl, αl)by the slow pseudo-random generator.

Remark 2. In some large scale computations, that is, when the probability thatW requires too manyXl

is not negligible, DRWS may exhaust memories of a computer to keep all of(xl, αl)that are currently generated. So, we had better always check how many memories are currently used for(xl, αl).

(5)

3. Mathematical formulation

In this section, we describe the mathematical structure of DRWS. LetT¹be a one-dimensional torus, which is an additive group [0,1)with additionx+y(mod 1). LetBbe the set of all Borel subsets ofT¹. We define an increasing sequence of subσ-fields{Bm}mofBby

Bm:=σ{[a, b);a, b∈Dm, a < b} =σ(d1, . . . , dm), m∈N.

Here, the functiondi(x)denotes theith digit ofx ∈T¹=[0,1)in its dyadic expansion, that is, d1(x):=1[1/2,1)(x), di(x):=d1(2ⁱ⁻¹x), x ∈T¹.

A functionτ :T¹→N∪ {∞}is called a{Bm}m-stopping time ([1,2]), if {τ ≤m}:= {x ∈T¹;τ(x)≤m} ∈Bm, ∀m∈N.

For a{Bm}m-stopping timeτ, we define a subσ-fieldBτ ofBby B_τ := {A∈B;A∩ {τ ≤m} ∈B_m,∀m∈N}.

L^p(T¹,Bτ,dx)is simply denoted byL^p(Bτ). A functionf :T¹→R∪ {±∞}isBτ-measurable if and only if

f (x)=f (x_τ(x)), x ∈T¹. (4)

Let us see the relation between the algorithm in Section 2and the above mathematical formulation.

Again, let the precision of computation be 2^−K. SetXl(x):=_K

j=12^−jd(l−1)K+j(x),l∈N. Then{Xl}^∞_l=1

isDK-valued uniformly distributed independent random variables on the probability space(T¹,B,dx). SupposeW is a functional of{Xl}^∞_l=₁, which is expressed as a function ofx ∈T¹in the following way:

W =f (x)=f _∞

i=1

2⁻ⁱdi(x)

=f _∞

l=1

2^−(l⁻¹^)KXl(x)

.

Now, ifW satisfiesAssumption 1, defining a{B_m}_m-stopping timeτ by

τ(x):=inf{lK;W =f (x)is calculated fromX1(x), . . . , Xl(x)}, x∈T¹,

we see thatf isBτ-measurable. As was seen inSection 2, however, we need not be explicitly aware of τ orf in practice. They are only needed for the mathematical analysis below.

LetN ∈Nbe the total number of samples ofW and let (xl, αl)∈D_K+^log²^N ×D_K+^log²^N ⊂T¹×T¹, l∈N, be uniformly distributed independent random variables. Set

xn := ∞

l=1

2^−(l−¹^)K xl+νn,lαl

K ∈T¹, n=1, . . . , N, (5)

(6)

whereνn,l is a random variable defined by¹ νn,l :=

n, ifl=1,

#{1≤u≤n;τ(x_u_(l−1)K) > (l−1)K}, ifl >1. (6) Sinceτ is a{B_m}_m-stopping time, it is easy to see thatν_n,l andx_nare well-defined.

Now, we have the following theorem.

Theorem 1. If f is Bτ-measurable, the random variables {f (xn)}^N_n=₁ are pairwise independent and identically distributed. Their common distribution coincides with that of f considered as a random variable defined on(T¹,B,dx).

Note that{x_n}^N_n=₁are all uniformly distributed but are not pairwise independent (Lemma 2).Theorem 1 asserts that if they are composed with aBτ-measurablef, then{f (xn)}^N_n=1become pairwise independent.

Theorem 1and the strong law of large numbers for identically distributed pairwise independent random variables ([3]) imply that iff ∈L¹(Bτ)andN is sufficiently large, we have

1 N

N n=1

f (xn)∼E[f]. (7)

Thus, we can obtain an approximated value of E[f] by (7). In particular, if E[τ] < ∞, we have τ ∈L¹(Bτ), and hence, puttingf =τ in(7), we know that the whole computation ends in finite time of order O(N).

Remark 3. IfE[τ] = ∞, the total computation time would be impractically large forN 1. Conse- quently, in practice, we have to be careful whetherE[τ]= ∞or not. For instance, ifτis a stopping time associated with von Neumann’s rejection method, then it satisfiesE[τ] < ∞, because its distribution should be a geometric distribution. Of course, this remark applies to both i.i.d.-sampling and DRWS.

Definition 1. Suppose thatE[τ]<∞andf ∈L¹(Bτ). The sampling method by means of{xn}^N_n=1, that is, the method to approximateE[f] by

1 N

N

n=1

f (x_n), 1≤N≤N,

is called the dynamic random Weyl sampling (DRWS).

The term “dynamic” indicates that the sampling points{xn}^N_n=₁vary according with the integrandf, more precisely, with the stopping timeτ.

Corollary 1. Iff ∈ L²(Bτ), DRWS has the same mean square error as i.i.d.-sampling, that is,Var[f] being the variance off, we have

1 For a setA, #Adenotes the number of its elements.

(7)

E





1 N

N

n=1

f (xn)−E[f]

2

= Var[f]

N , 1≤N≤N. (8)

4. Proof of theorem

First, we quote Example 2 of[7]as a lemma:

Lemma 1. Let(x, α)∈D_K+^log²^N ×D_K+^log²^Nbe a uniformly distributed random variable. Then for any pair of integers 1≤p < p≤N,x+pα_Kand x+pα

Kare both uniformly distributed inD_K and they are independent.

From now on, we fix 1 ≤ n < n ≤ N. Letm(n, n) :=max{l;νn,l < νn,l}. Then, sinceνn,l < νn,l

impliesνn,i< νn,i for alli=1, . . . , l, we have

m(n, n)=max{l; νn,i < νn,i, i =1, . . . , l}. (9)

Finally, set

˜ x_n :=

∞ l=1

2^−(l−¹^)K x_l+ ˜ν_n_,lα_l

K, ν˜_n_,l :=

νn,l ifl≤m(n, n),

n ifl > m(n, n). (10)

Lemma 2.

(i) Eachxn is uniformly distributed inT¹=[0,1). (ii) xnandx˜n are independent.

Proof. To prove both (i) and (ii), it is enough to show that for anyM∈N, the random variables xl+νn,lαl

K, xl + ˜νn,lαl

K, l=1, . . . , M, (11)

are all uniformly distributed inDK and they are independent.

Note that ifl≥2, bothνn,l andν˜n,ldepend on(x1, α1), . . ., (xl−1, αl−1)but they are independent of (x_l, α_l). Note also that we always haveν_n,l <ν˜_n_,l.

LetP be the probability measure that governs all(xl, αl). Then, for anys1, t1, . . . , sM, tM ∈DK, we see P ( xl+νn,lαl

K < sl, xl+ ˜νn,lαl

K < tl, l=1, . . . , M)

=

pl<p_l,l=1,... ,M

P

xl +plαl_K < sl, νn,l =pl, xl +p_lαl

K < tl, ν˜n,l=p_l, l=1, . . . , M

=

pl<p_l,l=1,... ,M

P

xM +pMαM_K < sM

xM +pMαM

K < tM

×P

xl+plαl_K < sl, νn,l =pl, xl+p_lαl

K < tl, ν˜n,l =p_l, l =1, . . . , M−1, νn,M =pM

˜

νn,M =pM

.

SincepM =p_M,Lemma 1implies that the events{xM +pMαM_K < sM}and{ xM +p_MαM

K < tM}

(8)

are independent, so we have

=

pl<p_l,l=1,... ,M

P (xM +pMαM_K < sM)P ( xM +p_MαM

K < tM)

×P

xl+plαl_K < sl, νn,l =pl, xl+p_lαl

K < tl, ν˜n,l =p_l,l=1, . . . , M−1, νn,M =pM

˜

νn,M =p_M

=sMtM

pl<p_l,l=1,... ,M−1

P

xl+plαl_K < sl, νn,l =pl, x_l+p_lα_l

K < t_l, ν˜_n_,l =p_l, l=1, . . . , M−1

=sMtMP ( xl+νn,lαl

K < sl, xl + ˜νn,lαl

K < tl, l =1, . . . , M−1).

Repeating this procedure, we eventually obtain P ( xl+νn,lαl

K < sl, xl+ ˜νn,lαl

K < tl, l =1, . . . , M)= M i=1

siti,

which completes the proof. 䊐

Proof of Theorem 1. First, byLemma 2(i),f (xn)andf are identically distributed.

Next, by(5)withnsubstituted fornand(10), we have

x_n_m(n,n_)K = ˜x_n_m(n,n_)K. (12)

Lets := τ(xn)/K. Then we haveτ(xn) > (s−1)K, and henceνn,s< νn,s. Consequently, it follows from(9)that

s ≤m(n, n). (13)

(12)and(13)imply

xnsK = ˜xnsK. (14)

On the other hand, sinceτ(x_n)≤sK andτ is a{B_m}_m-stopping time, the value ofτ(x_n)is determined byxnsK. Namely, we seeτ(xn)=τ(xnsK).

Then by(14), we must have

τ(x˜n)=τ(xn)≤sK. (15)

Consequently,(14)and(15)implies

x_n_τ(x_n₎= ˜x_n_τ(˜_x_n₎. (16) Sincef isBτ-measurable,(4)and(16)imply thatf (xn)=f (x˜n). Finally, fromLemma 2(ii) it follows

thatf (xn)andf (xn)are independent. 䊐

Acknowledgements

Partially supported by Grant-in-Aid for scientific research 11440034 Ministry of Education, Japan.

(9)

References

[1] P. Billingsley, Probability and Measure, 3rd ed., Wiley, New York, 1995.

[2] N. Bouleau, D. Lépingle, Numerical Methods for Stochastic Processes, Wiley, New York, 1994.

[3] N. Etemadi, An elementary proof of the strong law of large numbers, Z. Warhsch. Verw. Gebiete 55-1 (1981) 119–122.

[4] M. Luby, Pseudorandomness and Cryptographic Applications, Princeton University Press, Princeton, NJ, 1996.

[5] J. von Neumann, Various techniques used in connection with random digits, US Natl. Bureau Stand. Appl. Math. Ser. 12 (1951) 36–38.

[6] H. Sugita, Pseudo-random number generator by means of irrational rotation, Monte Carlo Methods Appl., VSP 1-1 (1995) 35–57.

[7] H. Sugita, Robust numerical integration and pairwise independent random variables, J. Comput. Appl. Math. 139 (2002) 1–8.

[8] H. Sugita, The Random Sampler. A C-language library for pseudo-random generation and dynamic random Weyl sampling, available athttp://idisk.mac.com/hiroshi sugita/Public/imath/mathematics.html.

[9] H. Sugita, S. Takanobu, Random Weyl sampling for robust numerical integration of complicated functions, Monte Carlo Methods Appl., VSP 6-1 (1999) 27–48.