www.elsevier.com/locate/spa
Uniform iterated logarithm laws for martingales
and their application to functional estimation
in controlled Markov chains
R. Senoussi
∗INRA, Laboratoire de Biometrie, Domaine St. Paul, Site Agroparc, 84 914.Avignon, Cedex 9, France
Received 27 March 1995; received in revised form 28 February 2000; accepted 28 February 2000
Abstract
In the rst part, we establish an upper bound of an iterated logarithm law for a sequence of processesMn(:)∈C(Rd;Rp) endowed with the uniform convergence on compacts, where Mn(x) is a square integrable martingale for each x in Rd. In the second part we present an iterative kernel estimator of the driving functionf of the regression model:
Xn+1=f(Xn) +n+1:
Strong convergences and CLT results are proved for this estimator and then extended to controlled Markov models.
Resume:
La premiere partie etablit une majoration de type loi du logarithme itere pour une suite de processus stochastiques Mn(x) ∈ C(Rd;Rp) muni de la topologie de la convergence uniforme sur les compacts, lorsqueMn(x) est une martingale de carre integrable pour toutx dansRd. La seconde traite par la methode des noyaux, le probleme de l’estimation iterative de la fonctionf
du modele de regression:
Xn+1=f(Xn) +n+1:
On prouve la consistance forte et etablissons dierentes vitesses de convergence de l’estimateur. On generalise ensuite ces resultats a d’autres exemples et en particulier au modele markovien contrˆole. c 2000 Elsevier Science B.V. All rights reserved.
MSC:primary 60F15; 62G05; secondary 60G42; 62M05
Keywords:Iterated logarithm law; Autoregressive model; Controlled model; Markov chain; Kernel estimator
0. Introduction
Part I of the paper proves a lim sup version of an iterated logarithm law for a se-quence of random processes (Mn(·))n¿1 with values in Rp and arguments or indices
in Rd. Processes (Mn(x))n¿1 are assumed to be square-integrable martingales for all
∗Tel.: 19-33-90-31-61-33; fax: 19-33-90-31-62-44.
x∈Rd and to have almost surely continuous paths for all n∈N. Strong laws on gen-eral Banach spaces have been established already (Mourier, 1953; Kuelbs, 1976), but in our case the space C(Rd;Rp) endowed with the topology of uniform convergence
on compacts is not a Banach space. We give rst simple conditions that ensure the strong uniform convergence and then strengthen this result by an iterated logarithm law. Unlike Strassen’s law (Heyde and Scott, 1973) for sequences of real r.v. the re-sult presented here is not an invariance principle but is only a strong law for variables that take values in a function space.
Part II develops some aspects of function estimation in the context of autoregressive models. Most studies of density or regression estimators generally use a Lp criterion, but for the controlled models the a.s. convergence is crucial in order to adapt an optimal control process. For this reason, we prove the a.s. uniform convergence (Devroy, 1988; Hernandez-Lerma, 1991), an iterated logarithm law and the pointwise weak conver-gence of the regression function estimatorfof the following controlled Markov model:
Xn+1=f(Xn) +C(Xn; Un) +n+1:
Our results are quite similar to classical density and regression kernel estimators of i.i.d. real sequences.
We now specify some notations that will be intensively used in this paper.
Bd(x; R) is the ball centered in x ∈ Rd with radius R in the Euclidean sense
kxk =pP(xi)2. D generally denotes a dense countable set of Rd as for instance D=S
m¿0Dm where Dm=Zd=2m. The following function h(t) = p
2LL(t) where LL(t) = log(log(t)) is used throughout the paper. We recall that C(Rd;Rp) is the metrisable space of continuous functions from Rp to Rd, endowed with the topol-ogy of uniform convergence on compacts. The modulus of continuity of functionf on [−N;+N]dis denoted!(f; N; )=sup(kf(x)−f(y)k; kx−yk6; kxk6N; kyk6N). For the probability part, the existence of a stochastic basis (;A;F= (Fn)n¿0;P)
satisfying the usual conditions, is always assumed, that isF is a Pcomplete, increasing and right continuous family of sigma elds.
The increasing process of a F-adapted, square-integrable vector martingale is the predictable and increasing sequence of semi-denite positive matrices.
hM; Min =Pnk=1E(Mk:tMk)|Fk−1), (or hMin), where Mn+1 =Mn+1 −Mn stands for the martingale dierence. More generally, we denote hM(x)in for sequence (Mn(·))n¿1 of random functions of C(Rd;Rp) such that, (Mn(x))n¿1 is a discrete,
F-adapted, square-integrable vector martingale for each x. Sometimes, with no loss of rigor, it happens that we use a same notation for dierent constants to avoid their profusion. Eventually, we simply refer to Duo (1990) and Iosifescu and Grigorescu (1990) each time we need to recall classical results on martingales.
Part A. Uniform strong laws
1. Strong laws in C(Rd;Rp)
limit in each point under the assumption E(sup06s61kXn(s)k)¡∞. We give below a comparable result for non-stationary sequences of random functions of martingale type under Lipschitzian conditions. We rst dene a function on Rd to specify the Lipschitzian conditions we need in the multivariate case, if x= (x1; : : : ; xd)
(x) = d Y
i=1
(|xi|+5{|xi|=0}):
Theorem 1.1. Let Mn(x)be a family of discrete martingales indexed byx∈Rd;with
values in Rp and let us assume that for some continuous increasing function a(·) on R+ and constants ¿0; ¿0;
(a) E(kMn(0)k2) =O(n);
(b) for all integers N and x; y∈Bd(0; N);
E(kMn(x)−Mn(y)k2)6a(N)nkx−yk(x−y):
Then; for all ¿ =2; the sequence n− Mn(·) converges a.s. to zero and uniformly
on compacts of Rd.
Proof. First, conditions (a) and (b) imply thatMn(·) has continuous paths a.s. and that the strong law for square-integrable martingales applies (Neveu, 1964): limnn−Mn(x)= 0 a.s. for all x. Hence, a.s. n−M
n(x) converges to zero on every dense countable set D.
Second, by Ascoli’s lemma we only have to prove thatn−M
n(x) is a.s. an equicon-tinuous sequence. If we consider the partial oscillation, W(f; N;2−m) = sup(kf(x)−
f(y)k; x; y∈Bd(0; N)∩Dm;kx−yk62−m) of f on Grid Dm=Zd=2m, we get
!(f; N;2−m)6CX
r¿m
W(f; N;2−r): (1)
Next, for N ¿0; ¿0; kxk; kyk6N, let us dene Events
A(n; x; y; ) ={sup(k−kMn(x)−Mn(y)k; 2n6k62n+1)¿}
and
B(n; m; N; ) =[(A(n; x; y; ); x; y∈Bd(0; N)∩Dm;kx−yk62−m):
On the one hand, by Kolmogorov’s inequality for martingales, it follows
P(A(n; x; y; ))6a1(N)kx−yk(x−y)−2(2n)−2(2n+1):
On the other hand, since the number of neighborsy∈Dm of x; ky−xk= 2−m is less than C2md, we get
P(B(n; m; N; ))6a2(N)−22−m+n−2n:
If ¡ =2 and C(n; N) =S
m¿1(B(n; m; N;2−m), then
P(C(n; N))6a3(N)2n−2n
X
m¿1
Since, P
m¿1P(C(n; N))¡∞, it follows by Borel–Cantelli Lemma that, from some
rankn∗, we have for allm∈N; x; y∈B(0; N)∩Dm; kx−yk62−m.
n−kMn(x)−Mn(y)k6d−m:
Setting D=S
mDm, by (1), we obtain for n¿n∗; x; y ∈ Bd(0; N)∩D and kx−yk 62−k; n−kM
n(x)−Mn(y)k6CPm¿k2−m6C2−k.
Since the paths are a.s. continuous, this inequality still holds on Bd(0; N) and this proves the a.s. equicontinuity of the sequence n−Mn( ).
Corollary 1.1. Let (Zi)i¿1 be an i.i.d. sequence of square-integrable r.v. in Rs; a positive continuous function onRd×Rd and a mappingF fromRs×Rd toRd which
meets the following conditions for some constants ; C1; : : : ; C4:
(i) kF(z;0)k6C1kzk+C2.
(ii) kF(z; x)−F(z; y)k26C3kx−yk(x−y)(C4kzk2+(x; y)).
Then; if n(x) =Pn
i=1(F(Zi; x)−E(F(Zi; x))); the sequence n
−n(·) converges a.s.
and uniformly on compacts to zero; for all ¿1 2.
Proof. The square-integrable martingalen(0) satises assumption (a) of Theorem 1.1 with= 1. Moreover, we have
E
n X
i=1
(F(Zi; x)−F(Zi; y)
2
6Cnkx−yk(x−y)(CEkZk2+(x; y)):
2. Iterated logarithm law
Heyde and Scott (1973) generalized the invariance principle of Strassen’s log–log law to discrete martingales and then to ergodic stationary sequences of r.v. by the Skorokhod representation method. However, for our purpose, we follow in this paper the classical approach by the exponential inequalities of Kolmogorov, adapted to ran-domly normed partial sums (Stout, 1970). Although we deal with the function space
C(Rd;Rp) (Ledoux and Talagrand, 1986), we recall that the result proved below is
not an invariance principle. Its proof relies on the following ILL for martingales. It has been adapted from Stout (1970) and proved in Duo (1990).
Theorem 2.1. (a) If Mn is a F-adapted real martingale and if s2n is an adapted
sequence converging a.s. to +∞ that satisfy for some F0-measurable r.v. C ¡1;
(i) |Mn+1|=|Mn+1−Mn|6Cs2n=h(s2n) and
(ii) hMin6s2n−1; then limn|Mn|=h(s2n−1)61 +C=2 a:s:
(b)This inequality continues to hold; ifCondition|Mn+1|6Cns2n=h(s2n)is substituted
The interesting case is C= 0. This result can be extended to the topological space
Proof. The proofs of Theorems 1.1 and 2.2 are based on the maximal inequality for positive supermartingales. ditions of the theorem. a∗
loss of generality, we may assume that a¿0, b¿0 and bound sup(a; b) by a+b. Inequality (2) follows in this second case, since is arbitrary.
Inequality (2) holds a.s. for each x. If the equicontinuity is established, each cluster point (:) is continuous and then (2) holds a.s. on Rd.
(3) To prove the equicontinuity on compacts, it is enough to show that, lim→0supn!(n; N; ) = 0 a.s.
F-martingale Hn has bounded increments,
where, function C(N) depends only onN (but may vary below) andW(Mn; N;2−m) = sup{|Mn(x; y)|:x; y∈Dm(0; N), y∈Vm(x)}.
We have, for all integers k; m and events
Bm k =
[
r¿m
sup tk6n¡tk+1
W(n; N;2−r)¿2−rt(N)
;
P(Bmk)6C(N)X r¿m
e{rdlog(2)−2r(−q)LL(k)}:
For large m,k, say m¿m∗, k¿k∗ and r¿m, we have
rLL(k)
2r(−q)
r −
dlog(2) LL(k)
¿rLL(k) =ruk:
If m∗¿1, P(Bm
k)6C(N) P
r¿me
−ruk6C(N)e−muk; and then
X
k¿k∗
P(Bm
k)6C(N) X
k¿k∗
(kLL())−m¡∞:
By Borel–Cantelli Lemma,P(limk¿k∗Bmk)=0. We have thus proved that suptk6nW(n; N;2−r)6(N)2−rt; for large k,m and all r¿m.
Finally, part (2) of the proof implies that
sup tk6n
!(n; N;2−m)6(N) X
r¿m
2−rt6C(N)2−tm a:s:
This proves lim→0supn!(n; N; ) = 0 and the equicontinuity of the sequence n(·).
3. Examples
3.1. Usual rates
If the increments of the martingale family behave well, i.e.,s2n=n, the convergence rate of Theorem 2.2 can be explicited.
Corollary 3.1. Let ¿0 and ¿ ¿0; be constants such that
(i) kMn+1(0)k6bn=2=(LL(n))1=2; trhM(0)in6an; (ii) for any; integer N and x; y∈Bd(0; N);
kMn(x)−Mn(y)k6kx−ykb(N)n=2=(LL(n))1=2;
trhM(x)−M(y)in6kx−yka(N)n:
Then; for all ¿0; supkxk6NkMn(x)k=( p
n=2(LL(n))1+)→0 a:s.
3.2. Regression models
By a zero mean noise (n)n¿1 with nite conditional moment of order¿2, we mean
a F-adapted sequence of r.v. such that,
H1 (i) E(n+1|
Fn) = 0 and E(t
n+1n+1|Fn) = ; (ii) for some ¿0; supnE(kn+1k2+2|Fn)¡∞:
We also consider a sequence of processes Yn(x), F-adapted for all x; an increasing continuous functionb(·) from R+ toR+. and assume there exist ¿0 and an adapted
sequence of r.v. n such that, a.s.
H2 (i) |Yn(0)|6n;
(ii) |Yn(x)−Yn(y)|6b(N)kx−ykn; ∀x; y∈Bd(0; N):
We study below the asymptotic behavior of the martingale family
Mn(x) = n X
k=1
Yk−1(x)k:
Proposition 3.1. Under assumptions H1; H2 and if a.s.
s2n= n X
k=1
2k→ ∞ and
∞ X
k=1
(2
k)1+(LL(s2k)) (s2
k)1+
¡∞;
the sequence Mn(·)=h(s2n−1) is a.s. relatively compact in C(Rd;Rp).
Remark 3.1. If P∞k=1(2
k=s2k)1+ is substituted for the above second series, we obtain the pointwise convergence result of Duo et al. (1990).
Proof. (a) If kxk6N; and a(N) = 1 +b(N)N, we get|Yn(x)|6a(N)n. Next, let us dene the events An+1=
a(N)nkn+1k6sn2=h(s2n) , their complements Acn+1 and put
n=n5An−E(n5An|Fn−1);
n=n5Ac
n−E(n5Acn|Fn−1):
Since, E(n5An|Fn−1) =−E(n5Ac
n|Fn−1), the martingale Mn(·) splits into two martin-gales Mn(·) =Mn(·) +Mn(·); where
Mn(x) = n X
k=1
Yk−1(x)k and Mn(x) = n X
k=1
Yk−1(x) k:
Next, set n(x) =Mn(x)=h(s2
n−1) and n(x) =Mn(x)=h(s2n−1).
(b) Theorem 2.2 applies to the family Mn(·). Since, kn+1k62s2
n=(a(N)nh(s2n)) and tr(E(t
nn|Fn))6tr( ), we get
(i) kMn+1(0)k6|Yn(0)| kn+1k6b∗s2n=h(s2n),
tr(hM(0)in)6tr( ) n X
k=1
Yk2−1(0)6a
∗
(ii) kMn+1(x)−Mn+1(y)k6|Yn(x)−Yn(y)| kn+1k6b∗(N)kx−yks2
(c)(i) The increasing process (in the semi-denite sense) of Martingale Nn(x) = Pn
On the other hand, the moment assumption yields the Chebychev-type inequality
E(kkk25
The previous moment inequality, proves that
trhWin6C(N)
Proof. The asymptotic equivalence s2
Part B. Functional estimation
This part deals with the kernel estimation of the unknown but smooth regression function f from Rd to Rd that drives a controlled Markov model of type (Duo, 1990)
Xn+1=f(Xn) +C(Xn; Un) +n+1: (3)
The control C is assumed to be known and the sequence (n) to be a white noise with respect to some ltration F= (Fn)n¿0, i.e., all i have the same distribution and n+1 is independent of Fn for all n. Model (3) extends the classical linear regression model Xn+1=AXn+n+1 and the nonlinear model of Hernandez-Lemma (1991),
Xn+1=f(Xn) +n+1: (4)
In the sequel, we content with a smooth kernel K and a bandwidth well adapted to iterative computations and tracking (Masry and Gyorfy, 1987). We assume also that
K is Lipschitzian with order and coecientk, and that
H3
K is a nonnegative and compactly supported function onRd;
which satises Z
K(z) dz= 1 and|K(u)−K(v)|6kku−vk:
If the dynamic system (4) is stable and if the stationary distribution has a density
h, a kernel estimator of h is dened for all ¿0, as follows:
ˆ
hn(x) = 1
n
n X
i=1
idK(i(Xi−x)); (5)
next, the function f of model (3) (or (4) if C≡0) can be estimated by
ˆ
fn+1(x) = Pn
i=1idK(i(Xi−x))(Xi+1−C(Xi; Ui))
Pn
i=1idK(i(Xi−x))
: (6)
We assume that ˆfn+1(x) = 0 whenever ˆhn(x) = 0.
The a.s. and weak convergence rates as well as the iterated logarithm laws of these estimators are quite similar to those obtained in the i.i.d. case (Hall, 1981; Mack and Silverman, 1982; Devroy and Penrod, 1984; Liero, 1989). The results rely on stability criteria of Lyapounov type presented in Duo (1990). Note that Iosifescu and Grigorescu (1990) present a wide range of pointwise a.s., log–log laws and weak convergence results and some invariance principles for dependent sequences (called random systems with complete connections). However, our proofs seem to have no counterparts in their framework.
We now briey recall the main denitions of Duo (1990) used in the sequel. A sequence (Xn; Un) of r.v. adapted to a ltration F with values in (E×U; E⊗U), is a controlled Markov chain if for some transition probability (x; u; dy) from E×U
in E, the distribution of Xn+1 conditionally to Fn is (Xn; Un; dx). E is said the state space and U the control space. Any sequence = (dn) of measurable functions
dn from En+1 to U is called a strategy. The strategy determines the control at any time:Un=dn(X0; : : : ; Xn). If, for a xed state x, the controldn(x0; : : : ; Xn−1; x) belongs
every sequence (Xn) of r.v. gives rise to a sequence of empirical distributions
n(B)=1=(n+1)Pn
i=05{Xi∈B}; B∈B(Rd). The sequence is said stable if, the sequence
n converges weakly to a stationary distribution a.s. In the controlled case (3), a class D of strategies stabilizes the sequence if a.s., for any admissible ∈ D, any
initial distribution and ∀ ¿0, there exists a compact C such that limnn(C)¿1−.
4. Non-controlled models
4.1. Strong convergence
Theorem 4.1. For the autoregressive model (4); assume that
1. f is continuous and limkxk→∞kf(x)k=kxk¡1.
2. n is a white noise with density p of classC1 and p and its gradient are bounded. 3. Model (4) is stable.
Then;
(A) Stationary distribution has a bounded density h of class C1 which satis-es h(x) =Rp(x−f(z))h(z) dz. Moreover; for all 0¡ ¡1=d and all initial distribution; hˆn(x)→h(x) a.s.; uniformly on compacts.
(B) For all x ∈ S={x; h(x)¿0}; fˆn(x) pointwise converges to f(x) a.s. If the noise has a moment of order m ¿2 (assumption H1) and if 0¡ ¡1=2d; the pointwise convergence strengthens to uniform convergence on compacts.
Finally;if f is of class C1;then for all ¿0; N ¡∞and=inf (;12−(d+));
sup kxk6Nk
ˆ
fn(x)−f(x)k=O(n−(LL(n))1=2) a:s:
We rst prove a lemma which enables us to convert the arithmetic mean to a type of weighted means.
Letxn be a sequence of real numbersxn, or in any normed spaces, and putSn() = Pn
i=1ixi; Sn=Pni=1xi.
Lemma 4.1. If Sn=n→s; then
(i) n−(1+)Sn()→s=(1 +)if ¿0,
(ii) kSn()k=O(n1+) if −1¡ ¡0,
(iii) kSn()k=O(log(n))if =−1,
(iv) kSn()k=O(1)if ¡−1.
Proof. If ai=i((i+ 1)−i) put n=Pni=1ai, then n= (n+ 1)+1−Pni=1+1i, and
Sn() = n X
i=1
i(Si−Si−1) =n+1(Sn=n)− n−1
X
i=1
(i) if ¿0, the following inequalities
The last inequality follows from the Taylor formula of function x.
If −1¡ ¡0, we getkSn()k6Cn1+. Conditions (iii) and (iv) follow by similar arguments.
Proof of Theorem 4.1. Since the noise has a density p and f is continuous, the probability transition is strongly Fellerian and in this case, the stability is equivalent to the positive recurrence. We recall also that assumption p ¿0 (or ¡1) is sucient to ensure the stability of the chain (Duo, 1990).
A.1. Properties of the stationary distribution. Thus (Xn) is positive recurrent with an invariant distribution that satises R
− Next, we can nd constants a; b, such that
|Mn(x)|6bn; hM(x)in 6an(1+2−d);
and easily prove that|(i(z−x)−(i(z−y)|6Cnkx−yk; ∀i6n, for 0¡ ¡ for some constant C and thus,
|Mn(x)−Mn(y)|=n|((n(Xn−x))−(n(Xn−y)))
Since the chain is stable and p bounded and continuous, we rst get
1
equicontinuity of sequencen−′ Hn′(·).
Finally, since gradp is bounded and has a compact support, we have
Thus, supx∈Rd|H˜n(x)−Hn′(x)|6C Pn
i=1i−d−= o(n−
′
) a.s., if ¿0. To resume, we have proved that, for all N ¡∞, 0¡ ¡1=d,
sup kxk6N
|′n−′Hn(x)−h (x)| →0 a:s:
Taking =d (′= 1) and =K ends the proof of statement A. B. Study offˆn(x): We decompose the bias into
ˆ
fn+1(x)−f(x) = (nhˆn(x))−1(Wn+1(x) +Rn+1(x)); (7)
whereKi(x) =K(i(X
i−x)), and
Wn+1(x) =
n X
i=1
idKi(x)i+1; Rn+1(x) =
n X
i=1
idKi(x)(f(Xi)−f(x)):
B.1. Uniform convergence of Rn(x): Let N ¡∞ and assume with no loss of gen-erality, that supp(K)⊂Bd(0; R). Since f is continuous, then ∀ ¿0; ∃=()¿0; such thatkf(x)−f(y)k6, for all kxk6R+N, kyk6R+N andkx−yk6. Next, observe that if x ∈ Bd(0; N) we have either kXi −xk¿ Ri− and then Ki(x) = 0, or kXi −xk6Ri− and then kXik6R+N. In this last case, the rst alternative
R6:i yields kf(Xi)−f(x)k6 and the second alternative R ¿ i yields kf(Xi)−
f(x)k62 supkzk6R+Nkf(z)k=L.
We put n1= inf (n; R6n), and get the inequalities
kRn(x)k6C+: n−1
X
i=n1
idK(x)6C+(n−1) ˆhn−1(x)
which prove the a.s. uniform convergence on compacts of Rn(·)=n, i.e., ∀ ¿0, limnsupkxk6NkRn(x)=nk6supkxk6N h(x) a.s.
B.2. If f is of class C1 andC= sup
x∈Bd(0; R+N)kgradfk, we have
Ki(x)kf(Xi)−f(x)k6CRi−Ki(x);
and,kRn(x)k6CPni=1−1i(d−1)Ki(x). Then, Lemma 4.1 and the uniform convergence of ˆ
hn(·) (part A) yield,
sup kxk6N
kRn(x)k=O(n1−) a:s: (8)
B.3. Pointwise convergence offˆn(·) :
hW(x)in= n−1
X
i=1
i2dKi2(x):
Since K is Lipschitzian, the kernel K2 is also Lipschitzian with the same order . If
we take =K2, = 2d,′=−d+ 1 = 1 +d, and proceed in the same way as in part A, we get
sup kxk6N k
(1 +d)n−(1+d)hW(x)in− h(x) Z
K2(z) dzk →0 a:s: (9)
B.4. Uniform convergence of fˆn(·): Clearly, if the noise has a moment of order
m ¿2, Wn meets assumption H1 of Proposition 1 and then, it is enough to verify assumption H2.
Indeed, Inequality |Yn(0)|6CndKi(0) and arguments as in part A.2 show that,
|Yn(x)−Yn(y)|6Cn(d+)kx−yk, ¿0. Therefore, Corollary 3 applies and says that the sequence Wn(·)=(2n1+2(d+)LL(n))1=2 is a.s. relatively compact.
Since, for all ¿(1 + 2d)=2, there exists ¿0 such that ¿(1 + 2(d+))=2, we get supkxk6Nn−Wn(x)→0 a.s.
In particular, the value = 1 is possible if ¡1=2d. Summing up, we have proved that,
if ¿0 and 12(d++ 1)6 ¡12(d+), then
sup kxk6N k
ˆ
fn(x)−f(x)k=O((n(2(d+)−1)LL(n))1=2):
Note that the case ¿1
2(d+) is useless, since the uniform convergence of the
esti-mator is not ensured and that the other case 0¡ 61
2(d++ 1) yields
sup kxk6N k
ˆ
fn(x)−f(x)k=O(n−(LL(n))1=2):
We do not know yet if the value=1
2(d+ 1) which gives the best rate is attainable.
4.2. Pointwise CLT and ILL
Theorem 4.2. If the assumptions of Theorem 4:1 hold; if f is of class C1 and if
1=(d+ 2)¡ ¡1=d; then forx1; : : : ; xq ∈S:
(A) (Zn(x1); : : : ; Zn(xq))whereZn(xj)=n(1−d)=2( ˆfn(xj)−f(xj));converges weakly to a
Gaussian distribution inRd×q which hasqindependent componentsNd(0; ( =(1+ d)h(xj))R
K2(z) dz); j= 1; : : : ; q:
(B) Moreover; if the noise has a nite conditional moment of order m ¿2; a point-wise iterated logarithm law holds on S;
lim n
n1−d 2 LL(n)
kfˆn(x)−f(x)k26 tr (1 +d)h(x)
Z
K2(z) dz a:s:
Proof. Considering bias (7), we have already proved that ˆhn(x) → h(x)¿0 a.s. on
S, if ¡1=d. Now, if f is C1 and ¿1=(d+ 2); the upper bound (8) improves
into, supkxk6NkRn(x)k= o(n(1+d)=2) a.s., and it remains only to study the asymptotic behavior (CLT and ILL) of Wn(x).
(A) We start checking the CLT assumptions for martingales (Duo, 1990). By (9), we get (1 +d)n−(1+d)hW(x)in→ h(x)
R
K2(z) dz a.s. For the Lindeberg’s condition, we note that V(t) =E(kk25{k
k¿t})→0 ift→ ∞, and that for ¿0 and kKk= sup(|K(u):u∈Rd), we have
n X
i=1
E(kWi(x)k25{kWi(x)k¿:n(1+d)=2})
6 n X
i=1
i2dKi2(x)V
n(1−d)=2 kKk
This is enough to prove the weak convergence of each Z(xi).
For the independence of components, it is enough to prove that, a.s.,
lim
n hW(x); W(y)in= limn n X
i=1
i(2d)Ki(x)Ki(y)¡∞; if x6=y: (10)
Considering the eventsAi={Xi∈Bd(x;Ri−)∩Bd(y;Ri−)}, we dene the martingale
Mn=Pni=1i2d(5Ai−P(Ai|Fi−1)) and its increasing process hMin. Since the density
is bounded, it follows by integration on Rd that,
P(Ai|Fi−1) =P(i∈Bd(x−f(Xi−1);Ri−)∩Bd(y−f(Xi−1);Ri−)|Fi−1)
65{kx−yk62Ri−}P(i∈Bd(x−f(Xi−1); 2Ri−)|Fi−1)
6Ci−d5{kx−yk62Ri−}:
Put N(x; y) = inf (i:kx−yk¿2Ri−) and observe that,
hMin6 n X
i=1
i4dP(Ai|Fi−1)6C
N(x;y)
X
i=1
i3d¡∞:
Thus, Mn converges a.s. to a nite r.v. M∞. Moreover, since
n X
i=1
i2dP(Ai|Fi−1)6C
n X
i=1
id5{kx−yk62Ri−}6CN(x; y)¡∞ a:s:;
the bound Ki(x)Ki(y)6kKk25
Ai implies that (10) holds.
(B) The second part of theorem is a simple consequence of Proposition 3.1 and Re-mark 3.1 if we takem=2+2,i=idKi(x),s2n=
Pn
i=12i and prove that P∞
i=1(2i=s2i)1+ converges a.s.
Since, limn(1+d)n−(1+d)s2n=h(x) R
K2(z) dz, it is enough to show the convergence
of P∞i=1i−Zi where Zi=idKi2(1+)(x) and= (1−d)(1 +) +d: First, observe that Pn
i=1Zi=n converges if ¡1=d, and then apply Lemma 4.1 to P∞i=1i−Zi; since ¡1: Next, note that limn(2sn2LL(s2n))−1kWn(x)k26tr a.s. to complete the proof.
4.3. Noise density estimator
If, in addition to functions f and h of the non-controlled model (4), we need to estimate the noise densityp, we consider on R2×R2 the autoregressive model, Zn+1=
F(Zn) +n∗+1, where
Zn+1=
Xn+1
Xn
; F(x; y) =
f(x)
f(y)
and ∗n+1=
n+1
n
:
PutK∗=K⊗K, choose a pointx0,h(x0)¿0, and dene the following kernel estimate:
ˆ
pn(y) = ( ˆhn(x0))−1hˆ
∗
n(x0; y+ ˆfn(x0));
where ˆh∗n(x; y) =n−1Pn
Corollary 4.1. Under the assumptions of Theorem 4:1; pˆn(·) converges a.s. to p(·)
uniformly on S∩C; for all compactC for all ¡1=2d.
If f is known at some point x0; h(x0)¿0, we had to substitute advantageously the
value ˆfn(x0) forf(x0). Another method is to vary x0 and take x0=y for example.
Sketch of the proof. (1) Note rst, that ifK is of orderand meets assumption (H3), so is K∗ onRd+d. Note also that, the noise∗ is not white anymore and the rst part of the proof of Theorem 4.1 demands a slight modication. However, the stability of the chain (Xn) implies the stability of (Zn), which has a stationary distribution∗ with density h∗(x; y) =h(x)p(y−f(x)).
(2) Split up H∗ n(x; y) =
Pn
i=1iK∗(i(Xi−1−x; Xi−y)), into
Hn∗= (Hn∗−H˜∗n) + ( ˜H∗n −Hˆ∗n) + ( ˆH∗n −H∗n) + H∗n;
where, H∗n(x; y) =p(y−f(x))Pn
i=1i−2dp(x−f(Xi−2)),
˜
H∗n(x; y) = n X
i=1
i−dKi−1(x)
Z
K(v)p(i−v+y−f(Xi−1)) dv;
ˆ
H∗n(x; y) = n X
i=1
i
Z
K∗(i(u; v))p(u+x−f(Xi−2))p(v+y−f(u+x)) dudv:
We rst prove that, limn→∞′n− ′
H∗
n(x; y) =h(x)p(x−f(x)) for all ¿2d and
′=−2d+ 1.
Note that, ˜H∗n is the F-compensator of H∗
n and that ˆH ∗
n is the F∗-compensator of ˜
H∗n where F∗= (Fn−1)n¿1.
(i) We follow the proof of Theorem 4.1 and readily obtain the a.s. uniform conver-gence on compacts of ′n−′
H∗n toh(x)p(x−f(x)), if we take =.
(ii) Put en= sup(!n;2Rn−); where!n=!(f; R+N; n−) is the continuity modulus of f. Since p and its gradient are bounded and continuous,K has its support included in Bd(0; R), and !n→0, we get for all x; y; kxk6N;kyk6N;
K∗(u; v)|p(i−u+x−f(Xi−2))p(i−v+y−f(i−u+x))
−p(x−f(Xi−2))p(y−f(x))|
6C(i−(kuk+kvk) +kf(i−u+x)−f(x)k)5{kuk6R;kvk6R}6Cei:
For example, applying Lemma 4.1 to the sequence ei, we obtain
sup kxk6N;kyk6N
n−′|Hˆ∗ n −H
∗
n|(x; y)6Cn −′
n X
i=1
i−2dei→0:
5. Controlled model
The controlled models (3) have no stationary distribution in general, and the statistic ˆ
hn(x) is not intended to estimate anything actual. However ˆfn(x) continue to estimate the regression function f as shown below.
Theorem 5.1. Assume that
(1) C is known and f unknown but continuous;
(2) Noise (n) is white with a strictly positive andC1-dierentiable density p and
if p and its gradient are bounded.
(3) u(x) = supu∈A(x)kC(x; u)k is bounded on compacts and
lim kxk→+∞
supu∈A(x)kf(x) +C(x; u)k
kxk ¡1:
Then; for all initial distributions and admissible strategies; statementBof Theorem
4:1 continues to hold on Rd.
Proof. Only minor modications of the proof of Theorem 4.1 are needed. First, there is nothing to change in studying Mn=Hn−H˜n, since if F(x; u) =f(x) +C(x; u), the process Hn(x) =nhˆn(x) has compensator
˜
Hn= n X
i=1
Z
K(z)p(i−z+x−F(Xi−1; Ui−1)) dz:
Next, we note that we must only bound ˆhn(:) on compacts (and not to deal with its convergence), the Lyapounov condition (3) enables us to stabilize the chain (Duo, 1990) and then, to get a constant M so that,
lim n
1
n
n X
i=1
kXik26M and for r ¿ M; lim n
n(Bd(0; r))¿1−(M=r)2:
Set,d(r)=sup(kF(x; u)k; u∈A(x);kxk6r):It follows thatm(r)=inf (p(z):kzk6R+
N+d(r))¿0; sincep ¿0 and continuous.
For larger, so that d(r)6r where ¡1, we obtain
kpk¿
Z
K(z)p(i−z+x−F(Xi−1; Ui−1)) dz
¿m(r)5{kxk6N;kXi−1k6r}
Z
K(z) dz=m(r)5{kxk6N;kXi−1k6r};
i.e., kpk¿n−1H˜n(x)¿m(r)n(Bd(0; r)); for all kxk6N.
Thus, if r ¿√2M, by the Lyapounov condition we get the bounds
0¡ 6m(r) 2 6limn
inf kxk6N
˜
hn(x)6lim n kxsupk6N
˜
hn(x)6kpk= ¡∞:
Since all terms in Lemma 4.1 are positive, we get
n−(1+)Sn() =n−1Sn−n1+ n−1
X
i=1
ai(Si=i) !
6Sn=n:
6. For further reading
The following reference is also of interest to the reader: Stute, 1982.
References
Devroy, L., 1988. An equivalent theorem for l1 convergence of the kernel regression estimate. J. Statist. Plann. Inference 18.
Devroy, L., Penrod, C., 1984. The consistency of automatic kernel density estimates. Ann. Statist. 12 (4). Duo, M., 1990. Methodes recursives aleatoires. Masson, Paris.
Duo, M., Senoussi, R., Touati, A., 1990. Sur la loi des grands nombres pour les martingales vectorielles et l’estimateur des moindres carres du modele de regression. Ann. Inst. H. Poincare 26 (4).
Hall, P., 1981. Laws of the logarithm for nonparametric density estimators. Z. Wahrsch. Verw. Geb. 56. Hernandez-Lerma, O., 1991. On integrated square errors of recursive nonparametric estimates of non
stationary Markov processes. Probab. Math. Statist. 12 (1).
Heyde, C., Scott, D., 1973. Invariance principles for the law of the iterated logarithm for martingales and processes with stationary increments. Ann. Probab. 1 (3).
Iosifescu, M., Grigorescu, S., 1990. Dependance with Complete Connections and its Applications. Cambridge Tracts in Mathematics. Cambridge University Press, Cambridge.
Kuelbs, J., 1976. A strong convergence theorem for Banach space valued random variables. Ann. Probab. 4. Ledoux, M., Talagrand, M., 1986. La loi du logarithme itere dans les espaces de Banach. C.R. Acad. Sci.
Paris 303 (2).
Liero, H., 1989. Strong uniform consistency of nonparametric regression function estimates. Probab. Theory Related Fields 82.
Mack, Y.P., Silverman, B.W., 1982. Weak and strong uniform consistency of kernel regression estimates. Z. Wahrsch. Verw. Geb. 61.
Masry, M., Gyorfy, L., 1987. Strong consistency and rates for recursive probability density estimators of stationary processes. J. Multivariate Anal. 22.
Mourier, E., 1953. Elements aleatoires dans des espaces de Banach. Ann. Inst. H. Poincare 13. Neveu, J., 1964. Bases mathematiques du calcul de probabilites. Masson, Paris.
Rao, R.R., 1963. The law of large numbers for D([0,1];R)-valued random variables. Theory Probab. Appl. 8.
Stout, W.F., 1970. A martingale analogue of Kolmogorov’s law of the iterated logarithm. Z. Wahrsch. Verw. Geb. 15.