General introduction and literature review
1.6 Variance estimation
1.6. VARIANCE ESTIMATION
positive. In most cases, first-order inclusion probability are prefixed or easy to compute, whereas second-order inclusion probabilities are unknown under most of the sampling designs including balanced and spatially balanced sampling designs. Even if calculation of second-order inclusion probabilities is possible through some recursive methods, it becomes computationally expensive for large populations. Therefore, sampling variance of HT-estimator is often estimated through approximations. Since the theory of HT- estimator underπps sampling is introduced, many approximations have been proposed in literature to estimate its sampling variance, most of them involve only first-order inclusion probabilities because they are often known.
Variance estimator under pps sampling (Hansen and Hurwitz, 1943) is often used for variance estimation under πps sampling, however it usually overestimates the sampling variance sinceπps sampling tends to be more efficient thatppssampling. Hartley and Rao (1962) proposed an approximation for variance estimation under randomized systematic πps sampling under the assumption ofN → ∞for fixedn. WhileH´ajek(1964) proposed a variance approximation under conditional Poisson sampling using assumption of N → ∞ and (N −n) → ∞. Ros´en (1991) considered variance estimation under pps systematic sampling. Berger (1998a) extended Haj´ek’s approximation for some other sampling de- signs. Berger (1998b) proposed a variance estimator under Chao’s sampling scheme for πps sampling (Chao, 1982). Deville (1999) proposed a variance approximation based on maximum entropy. Based on (H´ajek, 1964)’s approximation, Berger (2005) proposed a variance estimator under πps systematic sampling. Haziza et al. (2004) and Haziza et al. (2008) compared 12 estimators for sampling variance of HT-estimator under Rao- Sampford πps sampling procedure (Rao, 1965; Sampford, 1967). There are many other methodologies which are used for variance estimation including jack-knife and bootstrap methods, seeWolter(2007) for details. Those variance approximations which are used or discussed in later chapters of this thesis are described bellow.
Variance approximation based on pps sampling
A simple approximation of sampling variance under πps sampling is to use the sampling variance under pps sampling (Hansen and Hurwitz, 1943), discussed by (Durbin, 1953), (Cochran,1977, p. 252), (S¨arndal et al.,1992, p. 99,422) and (Wolter,2007, p. 12) among others. It does not require computation of second-order inclusion probabilities. Variance estimator based on this approximation often overestimate the sampling variance because πps sampling tends to have smaller variance than pps sampling. Sampling variance of HT-estimator underppssampling is given in Eq. (1.4) and its unbiased estimator is given
1.6. VARIANCE ESTIMATION
by
Vˆpps( ˆYHT) = 1 n(n−1)
∑
i∈s
( yi pi − 1
n
∑
i∈s
yi pi
)2
(1.14) where πi =npi.
(Deville and Till´e, 2005)’s variance approximation under balanced sampling
Deville and Till´e(2005) suggested that sampling variance of the HT-estimator under cube method can be approximated by the sampling variance of the GREG-estimator under Poisson sampling design. This approximation based on two arguments: first, sampling design under Poisson sampling does not requires computation of second order inclusion probabilities; second, Poisson sampling has maximum entropy and balanced sampling design is conditional of Poisson sampling design; see Deville and Till´e (2005) for details.
Let ˜πi denotes first-order inclusion probabilities under Poisson sampling design, the sam- pling variance of HT-estimator under poisson sampling (H´ajek, 1964) is given by
VP S( ˆYHT) = ∑
i∈U
y2i
πi2π˜i(1−π˜i) = zT∆z˜
where z = (y1/π1, ..., yN/πN)T and ∆˜ = Diag[˜πi(1−π˜i)]i∈U is a diagonal matrix. Note that ˜πi’s are unknown andDeville and Till´e(2005) given four approximations for ˜π(1−π).˜ Following (H´ajek, 1964, 1981)’s residual technique, Deville and Till´e (2005) proposed variance approximation under balanced sampling, given by
VP S( ˆYHT|XˆHT =X)≈VP S( ˆYHT + (X−XˆHT)Tβ) where
β=VP S(XˆHT)−1CovP S(XˆHT,YˆHT), VP S(XˆHT) = ∑
i∈U
xix⊤i
πi2 π˜i(1−π˜i) CovP S(XˆHT,YˆHT) =∑
i∈U
xiyi
π2i π˜i(1−˜πi)
1.6. VARIANCE ESTIMATION
When the term ˜π(1−˜π) is approximated by N πi(1−πi)/(N −q) (H´ajek, 1981; Deville and Till´e, 2005), the variance approximation under balanced sampling can be written as
V( ˆYHT)≈ N N −q
∑
i∈U
˜ e2i
π2iπi(1−πi) (1.15) where ˜ei =zi−z˜i, ˜zi =AT(X∆X˜ T)−1X∆z,˜ A = (x1/π1, ...,xN/πN) andX = (x1, ...,xN).
The corresponding variance estimator is given by Vˆ( ˆYHT)DT = n
n−q
∑
i∈s
ˆ˜ e2i πi
(1−πi) (1.16)
where ˆe˜i = zi −zˆ˜i, ˆz˜i = ATs(Xs∆˜sXsT)−1Xs∆˜szs, the subscript s denotes values corre- sponding to samples. The subscript DT for variance estimator means (Deville and Till´e, 2005)’s variance estimator.
In (Deville and Till´e,2005)’s variance approximation, the assumption of exact balancing may not be always true. Therefore, the variance estimator based on this approximation can be biased when sampling design is not exactly balanced. For cube method, assuming exact balancing of the design means that this approximation only aims the flight-phase of the cube method, the bias can increase as the sampling variance due to flight-phase decreases (or sampling variance due to landing-phase increases). Breidt and Chauvet (2011) also proposed simulation-based approximation for balanced sampling using cube method. In a simulation study, the variance estimator based on the simulation-based approximation was compared with (Deville and Till´e, 2005)’s variance estimator in Eq.
(1.16). The simulation-based variance estimator was approximately unbiased but less efficient as compared to (Deville and Till´e,2005)’s estimator.
Under spatially balance sampling, second-order inclusion probabilities of nearby units are likely to be zero or very close to zero. For example, in one- and two-dimensional systematic sampling second-order inclusion probabilities are non-zero only for the units which belongs to the same sample. Similarly in BSEC, second-order inclusion probabilities are non-zero only of the non-contiguous units in the list. Therefore, unbiased estimation of variance using Sen-Yates-Grundy estimation is not possible, as second-order inclusion probabilities appears in the denominator. According to Stevens Jr (1997), expression for second-order inclusion probabilities can be produced under GRTS design for continuous populations, although they are not known for finite (or discrete) populations. Due to near- zero second order inclusion probability, these expression may not give a stable variance estimator (Stevens Jr and Olsen, 2004). In another instance, Benedetti et al. (2017a)
1.6. VARIANCE ESTIMATION
proposed a model-based variance estimator for two-dimensional systematic sampling, one- per-stratum (or maximal stratification) sampling. This estimator required second-order inclusion probabilities to be know which is true for the two considered design but not for more advanced designs, for instance, LPMs and SCPS. Some variance estimators which are commonly used in practice or often appeared in literature are described in the following.
Grafstr¨om and Lundstr¨om (2013) also proposed a variance estimator when qualitative balancing variables are used for spatial or auxiliary balancing.
Local-mean (or local neighbourhood) variance estimator
Stevens Jr and Olsen (2003) proposed a variance estimator based on local neighbourhood (NBH) for GRTS design, it is also know as local-mean variance estimator (Grafstr¨om et al., 2012; Grafstr¨om and Lundstr¨om, 2013). The expression for the local-mean estimator is given by
VˆNBH( ˆYHT) = ∑
i∈s
∑
j∈Di
wij (yi
πi −y¯Di )2
(1.17)
where Di is a neighbourhood to unit i, containing at least four units, and wij are weights that decrease as the distance between unit i and j increases. The weights sat- isfy ∑
jwij = 1 and ¯yDi is a neighbourhood total (Grafstr¨om et al., 2012). Local-mean variance estimator is often recommended for spatially balanced sampling, unless a better estimator is available (Stevens Jr and Olsen, 2004; Grafstr¨om, 2012; Grafstr¨om et al., 2012; Robertson et al.,2013; Benedetti and Piersimoni, 2017).
(Grafstr¨om and Till´e, 2013)’s variance estimator under doubly balanced sam- pling
For doubly balanced sampling by local cube method, Grafstr¨om and Till´e (2013) in- troduced variance estimator by combining local-mean variance estimator (Stevens Jr and Olsen,2003) and variance estimator for balanced sampling (Deville and Till´e,2005), given by
VˆDBS( ˆYHT) = n n−p
p+ 1 p
∑
i∈s
(1−πi) (ei
πi −e¯i )2
(1.18)