INTRODUCTION - Balanced Two-Stage Equal Probability Sampling

3. (2Se, GREG) 4. (2Sc, GREG)

Two-stage epsem is an intuitive choice when survey variable can be expressed by a super- population model which assumes constant variance ofy-values across the ﬁnite population.

In this chapter, above four sampling strategies involving two-stage epsem are compared in a systematic way. The comparison is based on AMSE’s under the two-level regression model rather than their MSE’s for a given ﬁnite population; comparison of these sampling strategies with respect to AMSE’s is not seen in literature. AMSE represent average performance of a sampling strategy for the ﬁnite population parameters under a give super-population model. The aim of this comparison is to single out a preferred sampling strategy which might be useful from practical viewpoint.

Usually cluster sampling is less eﬃcient than element sampling, therefore, two-stage epsem base on two-stage cluster sampling (i.e. 2Sc) is expected to be less eﬃcient than two-stage element sampling (i.e. 2Se) when sub-clusters have positive intra-cluster correlation.

Also, equal-sized sub-clusters are rarely found in practice. This may motivate custom formulation of equal-sized sub-clusters. A simpliﬁed case is when equal-sized sub-clusters are randomly formed within each PSU. When a set of auxiliary variables is available, one may also look to formulate even better sub-clusters using the known values of auxiliary variables. In the following, a toy example demonstrates that there exist ways of cluster sampling by which sampling variance is zero.

Example 2.1. Let response values are y_i = 1, ...,16 and sample size is n = 4. Consider the following arrangement of y_i’s with rows as clusters:







1 8 9 16 2 7 10 15 3 6 11 14 4 5 12 13







where rows of the matrix are four samples which have the same sample mean is 8.5 (and total 34), but diﬀerent sample variance. The sampling variance of the row-sample mean is 0. This particular cluster sampling is the most eﬃcient design for this population, including compared against SRS of elements.

2.1. INTRODUCTION

To get cluster sampling like in the above example, unknown y-values are required which is not possible in a sample survey situation. However, auxiliary variables related to y-variable can be used for this purpose like other model-assisted approaches in survey sampling. In this chapter, one way of formulating sub-clusters is explored which aims to obtain sub-clusters such that they are equal-sized and have equal means of auxiliary variables. Assuming that such sub-clusters exists and can be achieved, a preliminary analysis indicates that SRS of these sub-clusters is actually same as equal-probability balanced sampling (with respect to auxiliary variables) of elements in term of AMSE.

Furthermore, these sub-clusters might be more diﬃcult to achieve in practice as compared to balanced sampling. However, it may have an advantage over balanced sampling, which is unbiased variance estimation because it is SRS of sub-clusters.

Rest of the chapter is arranged as follows. In Section2.2, formulation of sub-clusters using auxiliary variables is brieﬂy described which happen to be same as balanced equal probability sampling of elements. In Section 2.3, four sampling strategies involving two-stage epsem are compared with respect to their AMSE’s under a two-level super-population model. In Section 2.4, a simulation study is conducted which provides further insights of the comparison. In the last, Section2.5 gives some conclusions of this chapter. In the following two subsections, formulas of HT-estimator and GREG-estimator under two-stage sampling designs are given.

2.1.1 HT-estimators under two-stage sampling

Let a random sample s_I of n_I PSU’s is selected from the population U_I (U_I is defined in Section 1.2.1), there are N_I PSU’s in the population. Let π_g and π_gh denotes first- and second-order inclusion probabilities for the gth and (g, h)th PSU’s respectively, where g ∈ U_I and (g ̸= h) ∈ U_I. In two-stage element sampling, a random sample s_g of n_g elements is selected from the PSU’s selected in the first-stage sample, i.e. g ∈s_I. Letπ_i_|_g and π_ij_|_g denote first- and second-order inclusion probabilities ofith and (i, j)th elements in the gth PSU, i.e. i∈U_g and i̸=j ∈U_g. The HT-estimator of population totalY and its sampling variance under two-stage element sampling are given by

Yˆ_HT =

∑

g=1

YˆgHT

π_g (2.1)

V( ˆY_HT) =

∑

g,h=1

( π_gh π_gπ_h −1

)

Y_gY_h+

∑

g=1

1 π_g

∑

i,j=1

( π_ij_|_g π_i_|_gπ_j_|_g −1

)

y_giy_gj (2.2)

2.1. INTRODUCTION

where ˆY_gHT =∑mg

i=1y_i/π_i_|_g is HT estimator ofgth PSU totalY_g =∑

i∈Ugy_gi, see (S¨arndal et al., 1992, p. 137).

Under two-stage epsem by πps-SRS (i.e. 2Se), sample size of elements is equal within all sampled PSU’s, i.e. n_g ≡n₀. Sampling variance under 2Se can be written as

V( ˆY_HT^2Se) =

N_I

∑

g,h=1

( π_gh π_gπ_h −1

)

Y_gY_h+

N_I

∑

g=1

1 π_g

(1−f_g n₀ S_g²

)

(2.3)

where f_g =n₀/N_g, S_g² =∑

i∈Ug(y_gi−Y¯_g)² and ¯Y_g =Y_g/N_g.

For two-stage cluster sampling, let each PSU can be further divided intoNIIg sub-clusters of unequal sizesNgk whereg ∈UIandk = 1, ..., NIIg. LetUg1, ..., UN_IIg denote second-stag clusters ingth PSU. Let a random samplesIIg ofnIIgsecond-stage clusters (or sub-clusters) is selected from the gth PSU selected in ﬁrst-stage sample, i.e. g ∈sI. Let π_k_|_g and π_kl_|_g denotes ﬁrst- and second-order inclusion probabilities for kth and (k, l)th sub-clusters in the gth PSU, respectively. The HT-estimator of population total Y and its sampling variance under two-stage cluster sampling can be written as

YˆHT =

∑

g=1

1 π_g

nIIg

∑

k=1

Ygk

π_k_|_g (2.4)

V( ˆY_HT) =

∑

g,h=1

( π_gh π_gπ_h −1

)

Y_gY_h+

∑

g=1

1 π_g

N_IIg

∑

k,l=1

( π_kl_|_g π_k_|_gπ_l_|_g −1

)

Y_gkY_gl (2.5)

where Ygk =∑

i∈Ukygi is population total forkth sub-cluster in gth PSU.

Under two-stage epsem byπps-SRS (2Sc), assume that all the sub-clusters are equal sized, i.e. N_gk ≡N₀, and equal number of sub-clusters are sampled from each selected PSU, i.e.

n_IIg ≡n_II0. Sampling variance under 2Sc can be written as

V( ˆY_HT^2Sc) =

∑

g,h=1

( π_gh π_gπ_h −1

)

Y_gY_h+

∑

g=1

1 π_g

(1−f_IIg n_II0 S_IIg²

)

(2.6)

where f_IIg = n_II0/N_IIg, S_IIg² = ∑

k∈Ug(Y_gk −Y¯_IIg)² and ¯Y_IIg = Y_g/N_IIg is mean per sub- cluster.

2.1. INTRODUCTION

2.1.2 GREG-estimators under two-stage sampling

For GREG-estimator, (S¨arndal et al., 1992, p. 304) described three scenarios for availability of auxiliary data in the two-stage sampling design, given in the following.

Case A: (PSU Auxiliaries). The auxiliary totals are available for all PSU’s in the population.

Case B: (Complete Element Auxiliaries). The auxiliary values are available for all elements in the entire population.

Case C: (Limited Element Auxiliaries). The auxiliary values are available for all the elements in selected PSU’s only.

Case A leads to regression model at PSU level, Cases B and C lead to regression modelling of the element values. Similarly, above three cases of auxiliaries leads to three diﬀerent GREG-estimators under two-stage design. Here, we considered the Case B for availability of complete auxiliary data.

Let x_gi denotes a column vector of q-values corresponding to the ith element in the gth PSU. Let relationship of y and the auxiliary variables is expressed by an element level general linear model, given byy_gi=µ_gi+ϵ_gi, whereµ_gi=µ(x_gi) = x^⊤_giβis linear predictor, β is vector of q regression coeﬃcients and ϵ_gi is random error term associated with the value of ith element in thegth cluster, which is normally distributed with mean zero and variance σ_gi². A population based estimate B of unknown vector β is given in Eq. (1.8) which also an unknown ﬁnite population quantity. Let s = {s₁, ..., s_n_I} denotes all the elements in the two-stage sample. A sample estimate of B is given in Eq. (1.7) where π_i = π_gπ_i_|_g under 2Se, under 2Sc π_i = π_gπ_k_|_g for all i ∈ U_k, and it is assumed σ_i² = 1.

GREG-estimator for ﬁnite population total Y under two-stage element sampling, from (S¨arndal et al., 1992, p. 323), is given by

Yˆ_GR^2Se =∑

i∈U

y_i+∑

i∈s

ˆ e_i π_i =

∑

g=1 Ng

∑

i=1

ˆ y_gi+

∑

g=1

1 π_g

∑

i=1

ˆ e_gi

π_i_|_g (2.7) where ˆy_gi = x^⊤_giBˆ and ˆe_gi = y_gi−yˆ_gi. An approximate variance of GREG-estimator in Eq. (2.7) is given by

V( ˆY_GR^2Se)≈

∑

g=1 NI

∑

h=1

( π_gh π_gπ_h −1

)

e_ge_h+

∑

g=1

1 π_g

∑

i,j=1

( π_ij_|_g π_i_|_gπ_j_|_g −1

) e_gie_gj

Dalam dokumen Balanced Two-Stage Equal Probability Sampling (Halaman 56-60)