• Tidak ada hasil yang ditemukan

Inference for Noisy Samples

N/A
N/A
Protected

Academic year: 2019

Membagikan "Inference for Noisy Samples"

Copied!
11
0
0

Teks penuh

(1)

In Honour of Distinguished Professor Dr. Mir Masoom Ali On the Occasion of his 75th Birthday Anniversary PJSOR, Vol. 8, No. 3, pages 381-391, July 2012

Inference for Noisy Samples

Ibrahim. A. Ahmad

Department of Statistics Oklahoma State University Stillwater, Oklahoma 74078 [email protected]

N. Herawati

Department of Mathematics University of Lampung

Bandar Lampung, Indonesia 35145

Abstract

In the current work, some well-known inference procedures including testing and estimation are adjusted to accommodate noisy data that lead to nonidentically distributed sample. The main two cases addressed are the Poisson and the normal distributions. Both one and two sample cases are addressed. Other cases including the exponential and the Pareto distributions are briefly mentioned. In the Poisson case, the situation when the sample size is random is mentioned.

Keywords: Noisy samples, Hypothesis testing and estimation, Poisson, Normal, Exponential, and Pareto.

1.Introduction

Statistical inference with its estimation and hypothesis testing routes is usually based on the concept of “random samples”.This means that the data framework is assumed to be “independent identically distributed” (iid) random variables. Although this setting is becoming less and less practical, it is still the way inference is presented to statistics students via textbooks even at the graduate level, see Rohatgi and Saleh (2001) and Casella and Berger (2002). When the iid structure is violated, there are no guarantees that standard accepted procedures will continue to hold and often modifications result in only partial and or weaker solutions. This need not be the case in some noisy sampling schemes that are now in use. In the current work, it is shown that for some non-iid sampling schemes in the Poisson and normal cases, standard procedures can be modified keeping the main characteristics of the estimates or test statistics intact. The sampling we discuss here includes the now popular rate sampling, cf. Thode (1997) and Ng and Tang (2005) for Poisson distribution and Moser et al. (1989), and Sprott and Farewell (1993) for the normal distribution, among others.

(2)

variables Xi such that E(Xi)Ci, i1,...,n. While in Gaussian case, we have

i

i C

X

E( ) and 2 2

) (Xi Ci

V  or in the homogeneous variance case is just 2

.

Variations of the rate sampling in the Poisson case have been researched in the literature including work by Detre and White (1970), Sichel (1973), Thode (1997), Krishnamoorthy and Thomsons (2004), and Ng and Tang (2005). Rate sampling in the normal case with homogenous variance is known as the regression through the origin case. The celebrated Behrens-Fisher problem is a two-sample variation and solutions for several different forms of it were discussed by Bernard (1982), Moser et al. (1989), Sprott and Farewell (1993), and Moser and Stevens (1992).

2. Poisson Inference

LetX1,...,Xndenote a random sample from Poisson distributions such that Xi ~ P(i) withiCi, i 1,...,n, where Ci 0is a known constant. Our task is to do inference about. The usual iid case falls atC1...Cn 1.

The likelihood function is:

. ! )

, (

  

i X X i C

X C e

C X L

i i

i

(1)

Clearly one obtains the maximum likelihood estimate (MLE) of to be

. ˆ

ˆ

1 

  

i i

C X

MLE

This estimate is complete and sufficient foras well as unbiased, so it is the uniformly minimum variance unbiased estimate (UMVUE) of. This estimate suggests the following class of unbiased estimates for:

 1  1

ˆ

r i

i r i r

C X C

,r 0is an integer.

Now clearly,

. ) (

) ˆ

( 1 2

1 2 1

   r

i r i r

C C

V (2)

But since ˆ1is the UMVUE, we have

. )

ˆ ( )

ˆ

( 12

1 2

) ( 1

1 

  

 

r i r i

i C

C r

C V

V (3)

Hence as a by product we have a simple proof of the inequality:

) )(

( )

(3)

whereCi 0, i=1,…, n. Note also that in the Poisson case E

 

XiV(Xi),i1,,n. Thus we might try to find an unbiased estimate ofbased on the sample variance. Precisely we have:

THEOREM(1): The following estimate is an unbiased estimate of:

.

PROOF:Note that

.

Summing and substituting into (6) yield that

.

The result now follows.

Let us briefly discuss the relations between ˆ1 ˆandˆˆ . Sinceˆ is the UMVUE of,

Henceˆ andˆˆ are positively correlated with correlation coefficient equal to

2

We may combineˆ andˆˆ to obtain yet another unbiased estimate of.

Note that (ˆˆ/ ˆ) 1ˆ (ˆˆ| ˆ) 1.

and this simplifies to

(4)

Hence

is an unbiased estimate of.

Sometime in sampling from Poisson distribution we do inverse sampling where the sample size is random. Let N be an integer-valued random variable not necessarily independent of Xi’s. We then estimateby

,

On the other hand,

 

 

Thus the correlation coefficient betweenˆ andN ˆˆNis

,

provided that

.

Finally, we can easily deduce the following unbiased estimate:

(5)

To test H0:0vs. H1:0or 0or0, set any of the following test statistics that are all asymptotically normal:

Hence (16) leads to the null estimate of :

 

Set the test statistic:

) unbiased variance null estimate for we use the following:

THEOREM (2): UnderH0, the following is an unbiased estimate for the common parameter0

PROOF: Let

(6)

The result follows by noticing that:

, )

( 2 2 2 2 2

0 0

2

2

XiYjR CiR CiDjDj

E (21)

) (

) ˆ

( 2

0 2 2 0

2 2

0 2 0

0

 

i i i i j

j i

i C R C C R C D

D C

R R C

E

R  , (22)

) (

1 )

ˆ

( 2 2 2 2 2

0

 

j j j i j

j i

j

j D D D C D

D C

R Y

D

E (23)

and

2 0

2 2

)) ˆ ( ( ) ˆ ( ) ˆ

(

 

 

Ci

Dj

R E

V

E . (24)

The result follows from (21) to (24).

A test statistic based onˆˆ0is:

) 1 (

ˆˆ

ˆ ˆ

0 0

2 1 0 4

 

j

i D

C R R Z

. (25)

RejectH0if|Z4|Z or Z/2 according to the alternative.

3. Normal Inference

Let X1,X2,,Xn denote a random sample from normal distribution such that

) , (

~ i 2

i N C

X ,i1,,n. Then the likelihood function is given by

]. ) (

2 1 exp[ )

( ) 2 ( ) , , ,

( 2 2 2 2 2 2

i

i i n

n

C X C

X

L    

 (26)

Hence the MLE of and 2

are respectively:

2

ˆ

i i i C

C X C

andˆ2 1 ( iˆ)2

i

i C

X

n

 . (27)

Clearly, ˆis sufficient, complete and unbiased for so it is the UMVUE, while,

2 2

/ ) 1 (

ˆ

n n

E   , thus the UMVUE for 2

is 2 1/( 1) ( iˆ)2

i i

C n X C

S  

 .

Furthermore, we can verify that ˆand SC2 are independent and ˆC is normal with mean

and variance 2/

Ci2 and (n1)SC2/2 is 2(n1)as expected, since

2 2

2

) ˆ ( ) (

) ˆ

( 

Xi Ci Xi Ci n C .

(7)

Let us turn our attention to a general setting of the two sample case: let Xi~ N(Ci1,2),

. Now, we need to set an unbiased null variance estimate

which is independent of ˆ1andˆ2. This is done as the following:

THEOREM (3): Under H0:12 and with 2 , the MLE estimates ofand 2

(adjusted to be unbiased) are:

UnderH0, the likelihood function of the two samples is given by: )

differentiating with respect to 2

results in 2 through evaluating the following:

(8)

But

) 1

( ) ˆ (

2 2

2

2 0

j j

i D

C V

. The result now follows.

Hence, set the test statistic

(32)

4. Other Examples and Final Remarks (A) Other examples:

Extending the MLE’s and the hypotheses testing presented earlier for the non-iid samples from the Poisson and the normal distributions can be used in various other cases. We illustrate this by two further examples.

Example (1): Let Xi ~exp(Ci,) with p.d.f. ( ) 1exp[ ( )]

i

i

C x x

f     , where

} max {

1i n Ci

x

 

 and 0, . The likelihood function is:

n

i Xi Ci

C X L

1 1

)] )(

1 ( exp[ )

, ; ,

( ,min ( )

i i i

C X

. (33)

Hence the MLE of is

i i

n iX C

1

min ˆ

and of isˆ (1/ )

i iˆ

i X C

n  

 .But clearly,

 

 

i

C

E ˆ and

 

ˆ 1.

n n

E  

This leads to the UMVUE’s of and given by:

   

 

       

i

i i

n i

UB C

C X n

n

min ˆ

1 ˆ

1 (34)

and

ˆ

1 1

ˆ i i

i

UB X C

n  

 . (35)

WhenC1C2Cn 1, we get the known unbiased estimate of ,

1 / ˆ (1)

  

n n X X

.

Example (2): Let

 

i

C i

i f x

X

1

~  ,0xCi , =1,2,…,n.

1, 2,...,

i

i

). 2 (

~ 1

ˆ

ˆ ˆ

2 2 2

0

2

1

   

n m t

D C

T

j j

i

(9)

Thus the likelihood function is

   Ci

C X

L( , ;) , Ci

i n

i X

1

1 ( )

max

 

. (36)

Hence the MLE of is

). ( max

ˆ 1

1

i

C i n iX

(37)

To find the UMVUE of we need to adjust for bias. Note that

 

.

1 )

( )

ˆ ( ˆ

0

0

   

i i i

i

C C dx

x F du

u P

E

(38)

Hence max( )

1

ˆ 1

1

i

C i n i i

i

UB X

C C

 

    

  

 

is the UMVUE of .

(B) Some Final Remarks:

(i) In the normal two-sample case, there is a major outstanding problem known as the

“Behrens-Fisher” which is that when there is no estimate for that

can lead to a t-statistic, cf, Rohatgi and Saleh (2001). There are many suggestions of partial solutions. In this context, we offered one when we can assume12 22, see Section 2. Intermediate, approximate or partial solutions have been offered in the literature. For example, when one is known, say , or in the general case of , both unknown, Satterthwaite (1941) and (1946) proposed a method that provides an

approximate 2 for with

. Hence could be estimated by replacing

and by and , respectively, see also Ames and Webster (1991) and Moser

and Stevens (1992). The case when is given was discussed by Maity and Sherman

(2006). The value of is obtained via matching

* 2

2 2

1 2

2 2

1 *

2 ) / /

( ) / /

)(

( m nV S mS n . This gives

)]} 1 ( /[ { ) / /

( 12 22 2 24

*

n n n

m

. Then one replaces with its estimate . The

above method is based on matching the variance of the proposed estimate of (standardized) to the variance of a Chi-square and solving for

the induced parameter and then estimate it. Needless to say matching variance is not enough to guarantee matching of distributions. In a subsequent work we will provide a new method based on matching the distributions.

1 2

2 2 2

1 2

 

1 12

1

2 2 2 2

1 2 S1 S2

m n m n

   

 

   

   

 

2

2 1

2 2 2

2 2

1 2

1 2

1 1

m n

m n m n

 

   

 

  

2 1

2

2

2

1

S 2

2

S

1

2 2

S22

2 2 2

1 2

 

m n 2

(10)

(ii) Note that the essence of the Behrens-Fisher problem for testing ,

weighted sum of the and Chi-squares with unknown weights

independence and the problem becomes intractable. We shall provide in the near future some plausible solutions based on what the weighted sum of Chi-squares looks like.

(iii) There are situations where even for the model we discussed above, namely that the parameters are known to beproportionate, there may not be an explicit solution. However, in some of these cases approximate solutions may be possible. Let us illustrate this via the example of nonidentical Bernoulli trials: let X1,...,Xnbe n independent Bernoulli rv’s such that P(Xi 1) piCip, i1,...,n. As before assumeCi 0, i1,...,n to be known. Thus, the likelihood function is:

It is not difficult to show that the MLE of p exists and is unique but it is not explicit. If we need an explicit solution, we can use the approximation Cp Cip i n

giving an approximate estimate .

)

p Using the well known formula

, unbiased ifK=0 while it is positively biased ifK>0 in which case, the estimate

(11)

is asymptotically unbiased. Finally, using the well known formula

}. ) ( ) (

) cov( 2 )] ( [

) ( )]

( [

) ( { ] ) (

) ( [ )

( 1

2 2

2

Y E X E

Y X Y

E Y V X

E X V Y

E X E Y X

V    (43)

We get that

}. 2 ]

) 1 ( [

) 1 (

) (

) 1

( { ] ) 1 ( [ ) ˆ (

2 3

2 2

2

 

 

 

i i

i

i i

i i i

i i

i

C p

C C

p C C

p C

p

p C C p

C C

C p

p

V (44)

References

1. Ames, M. H. and Webster, J. T. (1991), “On Estimating Approximate Degrees of Freedom,”The American Statistician, 45, 45-50.

2. Bernard, G. A. (1982), “A New Approach to the Behrens-Fisher Problem,”

Utilitas Mathematica, 218, 261-277.

3. Chamberlin, S. R., and Sprott, D. A. (1989), “The Estimation of a Location Parameter when the Scale Parameter is Confined to a Finite Range: The Notion of Generalized Ancillary Statistic,”Biometrika, 76, 609-612.

4. Casella, G., and Berger, R. L. (2002), “Statistical Inference,” 2nd Ed., Duxbury, Pacific Grove, CA.

5. Detre, K., and White, C. (1970), “The Comparison for Two Poisson Distributed Observations,”Biometrics, 26, 851-854.

6. Krishnamoorthy, K., and Thomson, J. (2004). “A more Powerful Test for comparing Two Poisson Means,” Journal of Statistical Planning and Inference, 119, 23-35.

7. Maity, A., and Sherman, M. (2006). “The Two-Sample T Test with One Variance known,”The American Statistician, 60, 163-166.

8. Moser, B. K., and Stevens, G. R. (1992). “Homogeneity of Variance in the Two Sample Means Test,”The American Statistician, 46, 19-21.

9. Moser, B. K., Stevens, G. R., and Walts, C. L. (1989). “The Two Sample t Test Versus Satterthwaite’s Approximate F-Test,” Communications in Statistics-Theory and Methods, 18, 3963-3975.

10. Ng, H. K. T., and Tang, M. L. (2005). “Testing the Equality of Two Poisson Means Using the Rate Ratio,”Statistics in Medicine, 24, 955-965.

11. Rohatgi, V. K., and Saleh, Md. K. E. (2001). “An Introduction to Probability and Statistics,” 2ndEd., Wiley&Sons, New York, NY.

12. Satterthwaite, F. E. (1941). “Synthesis of Variance,”Psychometrika, 6, 309-316. 13. Satterthwaite, F. E. (1946). “An Approximatie Distribution of Estimates of

Variance Component,”Biometric Bulletin, 6, 110-114.

14. Sichel, H. S. (1973). “On a Significance Test for Two Poisson Variables,”

Applied Statistics, 22, 50-58.

15. Sprott, D. A., and Farewell, V. T. (1993). “The Differences Between Two Normal Means,”The American Statisticians, 47, 126-128.

Referensi

Dokumen terkait

Apple juice ( Malus sylvestris Mill ) has the ability to whiten the enamel surface of teeth that changes color from a solution of coffee soaked in

Pekerjaan : Pengadaan Bahan Permakanan Penerima Manfaat PSBR Bambu Apus Jakarta Tahun Anggaran

[r]

[r]

Kekurangan logam bila digunakan sebagai pereduksi adalah bahwa karena reaksi terjadi pada permukaan maka reaksinya sukar dikontrol, Bila reaksi redoks dilakukan dalam larutan

Analysis of the demand for inputs and supply of output using the production function approach has some limitations such as (1) it does not allow different firms to succeed at

dibuat akan mempermudah konsumen dalam memperoleh informasi mengenai toko penjualan hardwere komputer ini dan memperkenalkan bahwa sistem informasi berbasis web e-commerce

6.4.4 Pengaruh kinerja keuangan daerah yang diukur dengan rasio Tingkat Pembiayaan SiLPA terhadap alokasi belanja modal. Hasil pengujian pengaruh variabel kinerja keuangan daerah