Variance estimation - General introduction and literature review

General introduction and literature review

1.6 Variance estimation

1.6. VARIANCE ESTIMATION

positive. In most cases, first-order inclusion probability are prefixed or easy to compute, whereas second-order inclusion probabilities are unknown under most of the sampling designs including balanced and spatially balanced sampling designs. Even if calculation of second-order inclusion probabilities is possible through some recursive methods, it becomes computationally expensive for large populations. Therefore, sampling variance of HT-estimator is often estimated through approximations. Since the theory of HT- estimator underπps sampling is introduced, many approximations have been proposed in literature to estimate its sampling variance, most of them involve only first-order inclusion probabilities because they are often known.

Variance estimator under pps sampling (Hansen and Hurwitz, 1943) is often used for variance estimation under πps sampling, however it usually overestimates the sampling variance sinceπps sampling tends to be more efficient thatppssampling. Hartley and Rao (1962) proposed an approximation for variance estimation under randomized systematic πps sampling under the assumption ofN → ∞for fixedn. WhileHájek(1964) proposed a variance approximation under conditional Poisson sampling using assumption of N → ∞ and (N −n) → ∞. Rosén (1991) considered variance estimation under pps systematic sampling. Berger (1998a) extended Hajék’s approximation for some other sampling designs. Berger (1998b) proposed a variance estimator under Chao’s sampling scheme for πps sampling (Chao, 1982). Deville (1999) proposed a variance approximation based on maximum entropy. Based on (Hájek, 1964)’s approximation, Berger (2005) proposed a variance estimator under πps systematic sampling. Haziza et al. (2004) and Haziza et al. (2008) compared 12 estimators for sampling variance of HT-estimator under Rao- Sampford πps sampling procedure (Rao, 1965; Sampford, 1967). There are many other methodologies which are used for variance estimation including jack-knife and bootstrap methods, seeWolter(2007) for details. Those variance approximations which are used or discussed in later chapters of this thesis are described bellow.

Variance approximation based on pps sampling

A simple approximation of sampling variance under πps sampling is to use the sampling variance under pps sampling (Hansen and Hurwitz, 1943), discussed by (Durbin, 1953), (Cochran,1977, p. 252), (S¨arndal et al.,1992, p. 99,422) and (Wolter,2007, p. 12) among others. It does not require computation of second-order inclusion probabilities. Variance estimator based on this approximation often overestimate the sampling variance because πps sampling tends to have smaller variance than pps sampling. Sampling variance of HT-estimator underppssampling is given in Eq. (1.4) and its unbiased estimator is given

1.6. VARIANCE ESTIMATION

Vˆpps( ˆYHT) = 1 n(n−1)

∑

i∈s

( y_i p_i − 1

∑

i∈s

y_i p_i

(1.14) where π_i =np_i.

(Deville and Till´e, 2005)’s variance approximation under balanced sampling

Deville and Tillé(2005) suggested that sampling variance of the HT-estimator under cube method can be approximated by the sampling variance of the GREG-estimator under Poisson sampling design. This approximation based on two arguments: first, sampling design under Poisson sampling does not requires computation of second order inclusion probabilities; second, Poisson sampling has maximum entropy and balanced sampling design is conditional of Poisson sampling design; see Deville and Tillé (2005) for details.

Let ˜π_i denotes ﬁrst-order inclusion probabilities under Poisson sampling design, the sampling variance of HT-estimator under poisson sampling (H´ajek, 1964) is given by

VP S( ˆYHT) = ∑

i∈U

y²_i

π_i²π˜i(1−π˜i) = z^T∆z˜

where z = (y₁/π₁, ..., y_N/π_N)^T and ∆˜ = Diag[˜π_i(1−π˜_i)]_i_∈_U is a diagonal matrix. Note that ˜π_i’s are unknown andDeville and Tillé(2005) given four approximations for ˜π(1−π).˜ Following (Hájek, 1964, 1981)’s residual technique, Deville and Tillé (2005) proposed variance approximation under balanced sampling, given by

V_{P S}( ˆY_HT|Xˆ_HT =X)≈V_{P S}( ˆY_HT + (X−Xˆ_HT)^Tβ) where

β=V_{P S}(Xˆ_HT)⁻¹Cov_{P S}(Xˆ_HT,Yˆ_HT), V_{P S}(Xˆ_HT) = ∑

i∈U

x_ix^⊤_i

π_i² π˜_i(1−π˜_i) Cov_{P S}(Xˆ_HT,Yˆ_HT) =∑

i∈U

x_iy_i

π²_i π˜_i(1−˜π_i)

1.6. VARIANCE ESTIMATION

When the term ˜π(1−˜π) is approximated by N π_i(1−π_i)/(N −q) (H´ajek, 1981; Deville and Till´e, 2005), the variance approximation under balanced sampling can be written as

V( ˆYHT)≈ N N −q

∑

i∈U

˜ e²_i

π²_iπi(1−πi) (1.15) where ˜e_i =z_i−z˜_i, ˜z_i =A^T(X∆X˜ ^T)⁻¹X∆z,˜ A = (x₁/π₁, ...,x_N/π_N) andX = (x₁, ...,x_N).

The corresponding variance estimator is given by Vˆ( ˆY_HT)_DT = n

n−q

∑

i∈s

ˆ˜ e²_i πi

(1−π_i) (1.16)

where ˆe˜_i = z_i −zˆ˜_i, ˆz˜_i = A^T_s(X_s∆˜_sX_s^T)⁻¹X_s∆˜_sz_s, the subscript s denotes values corresponding to samples. The subscript DT for variance estimator means (Deville and Till´e, 2005)’s variance estimator.

In (Deville and Tillé,2005)’s variance approximation, the assumption of exact balancing may not be always true. Therefore, the variance estimator based on this approximation can be biased when sampling design is not exactly balanced. For cube method, assuming exact balancing of the design means that this approximation only aims the flight-phase of the cube method, the bias can increase as the sampling variance due to flight-phase decreases (or sampling variance due to landing-phase increases). Breidt and Chauvet (2011) also proposed simulation-based approximation for balanced sampling using cube method. In a simulation study, the variance estimator based on the simulation-based approximation was compared with (Deville and Tillé, 2005)’s variance estimator in Eq.

(1.16). The simulation-based variance estimator was approximately unbiased but less eﬃcient as compared to (Deville and Till´e,2005)’s estimator.

Under spatially balance sampling, second-order inclusion probabilities of nearby units are likely to be zero or very close to zero. For example, in one- and two-dimensional systematic sampling second-order inclusion probabilities are non-zero only for the units which belongs to the same sample. Similarly in BSEC, second-order inclusion probabilities are non-zero only of the non-contiguous units in the list. Therefore, unbiased estimation of variance using Sen-Yates-Grundy estimation is not possible, as second-order inclusion probabilities appears in the denominator. According to Stevens Jr (1997), expression for second-order inclusion probabilities can be produced under GRTS design for continuous populations, although they are not known for ﬁnite (or discrete) populations. Due to near- zero second order inclusion probability, these expression may not give a stable variance estimator (Stevens Jr and Olsen, 2004). In another instance, Benedetti et al. (2017a)

1.6. VARIANCE ESTIMATION

proposed a model-based variance estimator for two-dimensional systematic sampling, one- per-stratum (or maximal stratiﬁcation) sampling. This estimator required second-order inclusion probabilities to be know which is true for the two considered design but not for more advanced designs, for instance, LPMs and SCPS. Some variance estimators which are commonly used in practice or often appeared in literature are described in the following.

Grafstr¨om and Lundstr¨om (2013) also proposed a variance estimator when qualitative balancing variables are used for spatial or auxiliary balancing.

Local-mean (or local neighbourhood) variance estimator

Stevens Jr and Olsen (2003) proposed a variance estimator based on local neighbourhood (NBH) for GRTS design, it is also know as local-mean variance estimator (Grafström et al., 2012; Grafström and Lundström, 2013). The expression for the local-mean estimator is given by

Vˆ_NBH( ˆY_HT) = ∑

i∈s

∑

j∈Di

w_ij (y_i

π_i −y¯_D_i )2

(1.17)

where Di is a neighbourhood to unit i, containing at least four units, and wij are weights that decrease as the distance between unit i and j increases. The weights sat- isfy ∑

jwij = 1 and ¯yDi is a neighbourhood total (Grafström et al., 2012). Local-mean variance estimator is often recommended for spatially balanced sampling, unless a better estimator is available (Stevens Jr and Olsen, 2004; Grafström, 2012; Grafström et al., 2012; Robertson et al.,2013; Benedetti and Piersimoni, 2017).

(Grafstr¨om and Till´e, 2013)’s variance estimator under doubly balanced sam- pling

For doubly balanced sampling by local cube method, Grafström and Tillé (2013) introduced variance estimator by combining local-mean variance estimator (Stevens Jr and Olsen,2003) and variance estimator for balanced sampling (Deville and Tillé,2005), given by

Vˆ_DBS( ˆY_HT) = n n−p

p+ 1 p

∑

i∈s

(1−π_i) (e_i

π_i −e¯_i )2

(1.18)

Dalam dokumen Balanced Two-Stage Equal Probability Sampling (Halaman 46-51)