Balanced sampling - General introduction and literature review

General introduction and literature review

1.4 Balanced sampling

and B is vector of ﬁnite population regression coeﬃcients given by

B = (∑

i∈U

x_ix^⊤_i σ²_i

)₋1

∑

i∈U

x_iy_i

σ²_i (1.8)

from (S¨arndal et al.,1992, p. 227).

The main problem associated with GREG-estimation is unusual calibration weights, in- cluding negative and very large positive values, which can potentially reduce the eﬃciency of GREG-estimator. It happens when number of auxiliary variables is large.

1.4 Balanced sampling

When a set of auxiliary variables correlated with study variables is known before selection of the sample, then samples balanced with respect to known auxiliary variables are tend to be more eﬃcient than unbalanced samples. A sampling design is said to be balanced if population totals of auxiliary variables are equal to their respective HT-estimators.

In other words, any sample s selected under the balanced sampling design satisﬁes the following equations:

Xˆ_HT(s) = ∑

i∈s

x_i

π_i =∑

i∈U

x_i =X (1.9)

where ˆX_HT(s) denotes vector of HT-estimators based on sample s. In the context of balanced sampling, auxiliary variables are sometimes called balancing variables and set of equations in (1.9) are referred asbalancing equations. SRS design is balanced with respect to population size, that is, for x_i ≡1, ˆX_HT =∑

i∈sx_i/π_i = (N/n)∑

i∈sx_i =N. For πps sampling designs, HT-estimator for population size is given by ˆX_HT = ∑

i∈sx_i/π_i =

∑

i∈s1/π_i is random, wherex_i ≡1.

According to Deville and Till´e (2004), the history of balanced sampling dates back to early developments of ﬁnite population sampling in the beginning of twentieth century.

An early concept of balanced sampling named as ‘representative method’ was given by Kiær(1896). It was to select a sample such that it matches a know quantity. In the early work for balanced sampling, some purposive sampling methods were proposed. Yates (1946); Thionet (1953) also advocated balanced sampling. Some partial solutions for balanced sampling were given by Ardilly (1991);Deville (1992); Hedayat and Majumdar (1995); Deville et al. (1988). Deville and Till´e (2004) proposed cube method for bal-

1.4. BALANCED SAMPLING

anced sampling which is widely used in practice. Chauvet and Tillé(2006) gave a faster implementation of the cube sampling method. Fuller (2009b) studied properties of rejective method (Hájek, 1964, 1981) for balanced sampling. Chauvet et al. (2017) studied some sampling strategies involving the cube method and rejective method for balanced sampling. In recent developments, Benedetti et al.(2022) proposed a balanced sampling method based on a global optimisation algorithm called simulated annealing and Leuen- berger et al. (2022) suggested a modification which aims to improve efficiency of the fast implementation of the cube method.

1.4.1 The cube method

Deville and Tillé(2004) gave a random sampling method, calledcube method, which aims to select samples with fixed first-order inclusion probabilities and balanced with respect to a set of known auxiliary variables. The name of this sampling method is motivated by geometric representation of the sampling design using N-dimensional cube (or N- cube), where N denotes number of sampling units in the population and 2^N vertices of the N-cube represent all possible samples (of any size) from the population. In the geometrical representation of sampling design, vector of first-order inclusion probabilities π = (π₁, ..., π_N) is expressed as a convex combination of the vertices of the N-cube.

Sampling design under the cube method assigns selection probability p(s) to each vertex of the N-cube such that E(s) = π, that is, fixed first-order inclusion probabilities are achieved. The set of balancing equations in Eq. (1.9) can be defined as a hyperplane which intersects the N-cube. Selecting a balanced sample is to choose a vertex of the N-cube that remains in the hyperplane. The balanced sampling algorithm in the cube method, randomly reaches a vertex of the N-cube from the vector π in such a way that the balancing equations are satisfied, or approximately so.

Algorithm for the cube method randomly transforms elements of the vectorπ into sample membership indicators{0,1}. It consists of two phases: flight-phase andlanding-phase. In the flight-phase, a discrete time stochastic process, calledbalancing martingale, transforms the first-order inclusion probabilities into {0,1}indicator one-by-one such that balancing equations and fixed inclusion probabilities are achieved. It starts with the vector π(0) = π, at time t= 1, ..., T, three steps are repeated as follow:

1. Generate any vectoru(t), such thatu(t) is kernel of the matrixA= (x₁/π₁, ...,x_N/π_N), and u_i(t) = 0 if π_i(t−1) is an integer.

1.4. BALANCED SAMPLING

2. Compute λ^∗₁(t) andλ^∗₂(t), the largest values of λ₁(t) andλ₂(t) such that 0≤π(t− 1) +λ₁(t)u(t)≤1, 0≤π(t−1)−λ₂(t)u(t)≤1.

3. Select

π(t) =





π(t) +λ^∗₁(t)u(t) with probability q(t) π(t)−λ^∗₁(t)u(t) with probabiilty 1−q(t) where q(t) = λ^∗₂(t)/[λ^∗₁(t) +λ^∗₂(t)].

Above steps are repeated until it is no longer possible to carry out step 1. At the end of flight-phase, vectorπ(T) is obtained, if all the inclusion probabilities are transformed into {0,1} indicators, then the algorithm completes. Otherwise landing-phase is required to achieve the sample. In the landing-phase, balancing equations are compromised in order to get sample of fixed size such that fixed inclusion probabilities are respected. When balancing equations are not exactly satisfied, it is referred as rounding problem (Deville and Tillé, 2004; Tillé, 2011).

Letπ(T) =π^∗, and sampling design for the remaining units is formulated as optimization problem which minimizes the conditional sampling variance V ar( ˆX|π^∗). The landing- phase can be implemented in two way as follow:

• Linear programming: The conditional varianceV ar( ˆX|π^∗) is minimized using linear programming,

• Dropping balancing equations: At the end of ﬂight-phase if sample is not achieved, last variable from the set of auxiliary (or balancing) variables is dropped and ﬂight- phase is implemented again. This process continue unit a sample is achieve. There- fore it is advised to put the auxiliary variables in the order of their importance in the algorithm with this version of landing-phase.

(Deville and Till´e, 2004) advocated sampling strategy of balanced sampling by cube method and GREG-estimator as balanced sampling helps avoiding extreme weights for the GREG-estimator. In their simulation study, it was reported that percentage of samples with negative calibration weights reduced from 32% to 0.1%.

In the implementation of above algorithm for cube method, number of computational operations increase with square of the population size N². Chauvet and Till´e (2006) proposed a fast implementation of cube method for which number of computational operations increases with size of the populationN. In the fast implementation, ﬂight-phase of

Dalam dokumen Balanced Two-Stage Equal Probability Sampling (Halaman 32-35)