ON THE ASYMPTOTIC NORMALITY OF
L
2ESTIMATORS
Udjianna S. Pasaribu and Bambang Susanto
Abstract. We examine parametric estimation using the integrated squared error criterion. The idea is to find the parameter value that minimizes the L2
distance between the true density and the fitted density. ThisL2
method is a modification of a nonparametric criteria to parametric problems. We concentrate to study about the asymptotic normality for theL2
estimators using a Taylor expansion for the appropriate gradient vector
1. INTRODUCTION
In parametric estimation, methods which estimate the parameter through mini-mizing a data based estimate of some appropriate divergence between the assumed model density and the true density underlying the data, have a long history (Basu et al.(1998)[1]). These procedures include the classical maximum likelihood as well as minimum distance techniques studied by several statisticians. Parametric and nonparametric estimators seldom employ the same estimation criteria. Parametric algorithm typically rely on maximum likelihood while nonparametric algorithms favor the L2 or integrated square error criterion. The first suggestion of replacing the likelihood function with L2 distance was given by Terrel(1990)[10]. The L2E criterion for parametric problems was rediscovered by Hjort(1994)[4] and later by Scott(1997)[7]. He developed parameter estimates with good robustness proper-ties relatives to maximum likelihood. The present note may be considered as a supplement of the results obtained by Scott(2001)[8]. Our aim here is to prove the asymptotic normality of theL2estimators. A common way of proceeding is to
Received 18 August 2005, Revised 14 November 2005, Accepted 16 January 2006. 2000 Mathematics Subject Classification: 62F10.
Key words and Phrases: asymptotic normality, integrated squared error,L2estimators.
start from a Taylor expansion of the vector of the first derivatives of the appropriate function.
2. MAIN RESULTS
Consider a parametric family of the models{Fθ}indexed by unknown finite dimensional parameter θ in an open connected subset Θ of a suitable Euclidean space, possessing densities f(.|θ) . LetGbe the distribution function underlying the data, having density g. Let X1,X2,· · ·Xn be independent and identically distributed with the distributionG. TheL2 estimator ˆθis obtained by minimizing the followingL2E criterion with respect toθ :
L2E(θ)=.
Z
f(z|θ)2 dz−2
n
n
X
i=1
f(Xi|θ).
Here and in many integral expressions in this note we omit the variable of integra-tion for convenience. Differentiating with respect toθ, ˆθcan also defined as a root of the equation :
n
X
i=1
Ψ(Xi, θ) = 0 with Ψ(Xi, θ) =
Z ∂f(z|θ)
∂θ f(z|θ)dz−
∂f(Xi|θ)
∂θ (0.1)
Theorem 2.1. Let θ0 be the true value of θ. Under suitable regularity conditions, theL2estimatorθˆn is consistent for θ0, and√n(ˆθn−θ0)is asymptotically normal with vector mean zero and covariance matrix G−1HG−1, where G = G(θ0) and
H =H(θ0)are given by :
H=
Z ∂f(x|θ
0)
∂θ
∂f(x|θ0)
∂θ T
f(x|θ0) dx−J JT
G=
Z ∂f(x|θ
0)
∂θ
∂f(x|θ0)
∂θ T
dx
with J=R ∂f(x|θ0)
∂θ f(x|θ0)dx.
3. PRELIMINARIES
Let (an) be sequence of positif real numbers, (Xn) be a sequence of random vectors inRk on a probability space ( Ω, Λ,P) andF
n denote the distribution function of Xn.
1. The sequence (Xn) converges in distribution toXwith distribution function
F , in symbol Xn →d X , if lim
n→∞Fn(t) = F(t) for all t ∈ R
k which are
continuity points ofF.
2. The sequence (Xn) converges in probability to zero, writtenXn→p 0orXn= op(1), if and only if for everyǫ >0, lim
n→∞ P({ω∈Ω :kXn(ω)k< ǫ}) =1. 3. The sequence (Xn) converges in probability to a random vectorX, in symbol
Xn p
→X, if and only ifXn−X=op(1).
4. The sequence (Xn) is of smaller order than an in probability , written Xn=op(an), if and only ifa−1n Xn=op(1). Xnis usually called a consistent estimator forc.
R3. Convergence in probability to a constant is equivalent to convergence in dis-tribution to the given constant and a sequence of random vector convergence in probability to a constant vector is equivalent to the sequences of the com-ponents convergence in distribution to the associate constant. For a proof see [9, p.19].
Proposition 3.1. Let Xn →d X and An be a sequence of random matrices con-verges in probability to a constant matrixA. ThenAnXn
d
→AX.
Proposition 3.2. Let Xn d
→ X and an be a sequence of positive real numbers converges to zero Then anXn
d
→0.
Theorem 3.1. Let X1,X2,· · ·, Y1,Y2,· · · andX be random vectors defined on a probability space and let g be a continuous vector valued function defined on Rk SupposeXn→d X,Yn−→pcwhere c is a finite constant Then
Xn+Yn→d X+c and g(Xn)→d g(X) (0.2)
R6. LetRbe a function defined on a neighborhood of0inRk such thatR(0) =0 random vectors. The existence of a finite constantcfor which
1 variance matrixΣ. Then√n(Xn−µ)is asymptotically normal with vector mean zero and covariance matrixΣ, i. e.
√
dimensional normal distribution with mean vectorµand covariance matrix Σ.
It follows from Theorem 3.3 and Proposition 3.2 thatXnis a consistent estimator forµ, i.e. asymptotic normality implies consistency.
4. PROOF OF THE THEOREM
Proof. We only give the proof of the asymptotic normality, because consistency proof in more general setting was given by Basu, et. al.[1].
By using (0.4) :
Taking (0.7) into account, we get from Remark 7 :
·
Using Proposition 3.1 by combining (0.6) and (0.8), we have
·
−∂L∂θn(Tθ0)
¸−1√
n Ln(θ0)→d N(0, G−1HG−1) (0.9)
Finally using Theorem 1 and the fact (0.9) and (0.5), the theorem follows.
Remark:
1. The basic idea of the proof follows that of [7, Theorem 1] for the univariate case. 2. Regularity conditions are typically very technical, and usually satisfied in most reasonable problems. These conditions mainly relate to differentiability of the den-sity and the ability to interchange differentiation and integration and the bound-edness of the remainder term. For more details and generality, see [5, Section 6.3.]
Acknowledgement. This research was partially supported by the Mathemati-cal Scientific Activities Grant (MSA-Grant), Department of Mathematics, Institut Teknologi Bandung. The authors would like to thank the referee(s) for their useful remarks and suggestions on the earlier version of this note.
REFERENCES
1. A.H. Basu, I.R. Harris, N.L. Hjort, M.L. Jones, “Estimation by minimizing a density power divergence”,Biometrika85(1998), 549–560.
2. H. Cram´er, Mathematical Methods in Statistics, Princeton Univ. Press, New York, 1946.
4. N.L. Hjort, “MinimumL2 and Robust Kullback-Leibler estimation”,Proceedings of the 12th Prague Conference on Information Theory, Statistical Decision Functions and Random Processeseds. P. Lachout and J. ´A. V´ı˜sek, (1994), 102–105.
5. E.L. Lehmann, and G. Casella, Theory of Estimation, 2nd ed., Springer-Verlag, New York, 1998.
6. W.R. Pestman, and I.B. Alberink,Mathematical Statistics : Problems and Detailed Solutions, Walter de Gruyter, Berlin, 1988.
7. D.W. Scott, “Parametric modeling by minimumL2 error”, Technical Report 98–3
(1997), Rice University, Dept. of Statistics, Houston.
8. D.W. Scott, “Parametric statistical modeling by minimum integrated square error”,
Technometrics43(2001), 274–285.
9. R.J. Serfling,Approximation Theorems of Mathematical Statistics, John Wiley and Sons, 1980.
10. G.R. Terrel, “Linear density estimate”, Proceedings of the Statistical Computing Section ,American Statistical Association, (1990), 297–302.
U.S. Pasaribu: Department of Mathematics, Institut Teknologi Bandung, Bandung 40132, Indonesia.
E-mail: [email protected].
B. Susanto: Department of Mathematics, Universitas Kristen Satya Wacana, Salatiga 50711, Indonesia.