Usefulness of Sampling Plans 6 6
7.7 Binomial Sampling Using an Empirical Model
A probability model is not a necessary prerequisite for developing a binomial count sample plan or for estimating their OC and ASN functions. A probability model for the p–µrelationship can be replaced by an empirical model. Several models have been proposed (see, e.g. Ward et al., 1986), but one that has been found most useful and can easily be fitted using linear regression is as follows:
ln(ln(1 p)) =cT+ dTln(µ) (7.11)
where µis the mean and pis the proportion of sample observations with more than T organisms. Note that this model has been shown to fit the relationship for values of Tgreater than as well as equal to 0 (see, e.g. Gerrard and Chiang, 1970), so the parameters have a subscript Twhich signifies that they depend on the tally number used. In a strict sense, there are problems with fitting Equation 7.11 using linear regression, because both p and µ are estimated and so dT will tend to be underestimated (see, e.g. Schaalje and Butts, 1993). However, provided that the data cover a wide range of µaround cd(or, alternatively, if the range 0–1 is nearly covered by p), and the estimates of pand µare relatively precise, the underestima- tion should be ignorable. It is more important to check that the data really are linear before fitting the model. Upon fitting the model, the critical proportion, cpT (note the suffix T) is calculated as
(7.12) There will always be variability about the relationship expressed by Equation 7.12.
This variability is not used to determine the critical proportion, but it is of critical importance for estimating the OC and ASN functions.
Once cpThas been determined, OC and ASN functions for any sample plan that classifies binomial counts with respect to cpTcan be estimated using a bino- mial distribution, relating the range of true means (µi) to a range of probabilities (pi) using Equation 7.11. Here, we must consider variability in Equation 7.11 because, while this model may provide a good average description of the data, not all data points fall exactly on the line. Corresponding to any true mean (µi) there will be a range of probability values around pi = 1 exp(ecTµidT), and the OC function at µiis an average of points, each of which corresponds to one of these probability values.
The procedure is similar to that followed for assessing the consequences of variation about TPL (Chapter 5, Appendix). We assume that actual values of ln(ln(1 p)) are approximately normally distributed about the regression line, as determined from the equation
ln(ln(1 p)) =cT+ dTln(µ) + z(0,σε) (7.13) where z(0,σε) is a normally distributed random variable with mean 0 and standard deviation σε. The OC and ASN functions are determined using simulation. Each time a sample plan is used to classify a particular density during the simulation,
cpT= −1 e−
(
ecTcddT)
= −1 exp(
−e cdcT dT)
Binomial Counts 173
Equation 7.13 is used to generate the value of pthat will be used to characterize the population of binomial variates being sampled. In other words, if for a particular true mean, 500 simulation runs are used to determine an OC and ASN value, 500 different values for pwould be determined using Equation 7.13 and sampling would be simulated from each of these values for p. To generate a value of p, a normally distributed random variable, z, with mean 0 and standard deviation σε, is first gener- ated, a random factor RF = ez is calculated, and the value of p is 1 exp(ecTµdTRF). If random error is not to be included in the simulations, the random factor is 1: RF=1.
The only question that remains to be answered before calculating OC and ASN functions for empirical binomial models is how to estimate σε. Several writers have addressed this question (Binns and Bostanian, 1990b; Schaalje and Butts, 1992) and it has not yet been fully resolved. However, a reasonable and conserva- tive approximation is
(7.14) where mseis the mean square error from the regression, Nis the number of data points in the regression, ln(m)——
is the average of ln(m), and sd2is the variance of the estimate of the variance of dT. The astute reader will note that when the same type of variability was considered for TPL, only mse was used to estimate σε. Strictly speaking, an Equation such as Equation 7.14 should be used to estimate variability about TPL. However, we use mseto simplify matters. This simplification is unlikely to have undesirable consequences because the last term in equations such as Equation 7.14, namely mse, is almost certain to account for most of the estimate of σε. If this is so, then why have we included the additional terms in Equation 7.14?
We included them because much has been written on estimating the variability about the model expressed by Equation 7.11, whereas the variability about TPL has scarcely been considered, and we wished to maintain consistency with the existing literature. Nevertheless, when we deal with uncertainty in the incidence–mean relationship in order to predict OC and ASN functions for proposed sample plans, mseis the most important component to take into account. The other components are less important, except when the relationship (Equation 7.11) is based on few points and the standard error of the slope parameter dTis large.
The OC and ASN functions calculated using the procedures just described are averages for the range of pthat may occur for a particular true mean. Because these OC and ASN are averages, they say nothing about the extremes that may be encountered when using a sampling plan at a particular place and time. They dis- play the average values that should be expected in the long run, if the sampling plan were used for these mean densities over and over again.
The strategies for estimating OC and ASN functions for the negative binomial and empirical models are similar but not identical, and can be summarized in a table (note that RFis the random factor defined above, and is defined in the same way for both binomial count methods):
σε =˙ mse+ − +
N [ln( ) ln( )m m]2 2sd mse
Full Binomial count based on Binomial count based on count negative binomial empirical binomial Criterion cd cpT=Probnb(x> T|cd,k) cpT=1 eecTcddT
Range of means µi pi=Probnb(x≥T|µi,k)
Distribution Negative Binomial Binomial
for binomial simulation
As with all other binomial count models, the sample size and tally number influ- ence the bias and precision of an empirical binomial count sample plan. To con- sider the effect of Ton the OC and ASN, it is necessary to estimate the parameters for Equation 7.11 for several values of T. To minimize bias and maximize classifica- tion precision, a useful guide is to use a Tfor which mse is smallest, subject to the condition that the regression remains linear. These concepts are illustrated in Exhibit 7.4.
k a RF
i ib
i
= −
µ
µ µ
2
pi = −1 exp
(
−ecTµidTRF)
Binomial Counts 175
Exhibit 7.4. Binomial classification sampling using an empirical model
This example is based on sampling Colorado potato beetle on stalks of potato plants. Binns et al.(1992) described data consisting of 50–200 counts of larvae taken on 74 occasions over 3 years. Critical density for spring larvae can vary depending on other stresses on the plant (poor growing conditions, other pests) and the grower’s perception of the threat, but the one suggested by the authors, cd=6, is used here.
The development of a binomial count sample plan using an empirical model follows the steps used in previous examples, when a probability model was used to relate pto µ. The first three plans to be considered were as follows:
1. Fixed sample size binomial count (n=50, T=0).
2. SPRT binomial plan (µ1=7, µ0=5, α= β =0.1, minn=5, maxn=50 and T=0) with no variability in the p–µmodel.
3. SPRT binomial plan (µ1=7, µ0=5, α= β =0.1, minn=5, maxn=50 and T=0) including variability in the p–µmodel.
The OC functions for fixed sample size and sequential binomial counts determined without p–µ model variability were nearly identical, but the sequential plan resulted in significantly smaller sample sizes (Fig. 7.14). However, the evaluation of Continued
the sequential plan gives an overly optimistic assessment of the precision of classifi- cation, as it assumes that the relationship between mand pis completely determin- istic. When variability in the p–µ model was included, the OC function became much flatter, showing that the sample plan would probably be unacceptable due to the great potential for errors. Further improvement of the sample plan is therefore necessary.
Increasing the tally number can increase the precision of classifications made by the binomial count sample plan. The fitted parameters of the p–mmodel for T=0, 2 and 4 are shown in Table 7.3. SPRT plans (µ1=7, µ0=5, α= β =0.1, minn=5 and maxn = 50) comparing these tally numbers were compared (Fig. 7.15). The slope of the p–µmodel increases for each increase in Tand, for T=2 and 4, mseis less than for T=0. There is a big improvement in the OC when increasing Tfrom 0 to 2. There is also an improvement in the OC when increasing T from 2 to 4, although it is not as great.
Fig. 7.14. The OC (a) and ASN (b) functions for three binomial count sampling plans used to classify the density with respect to cd=6. The model ln(ln(1 p))
=cT+ dTln(µ) was used to describe the data with T=0. The first plan (___) used a fixed sample size with n= 50. The second ( … ) and third (- – -) plans were based on the SPRT with parameters µ0=5, µ1=7, α= β =0.1, minn=5 and maxn=50.
Variation about the p–µmodel was included when determining the OC and ASN for the third plan.
Table 7.3. Parameters for the model fitted to Colorado potato beetle data.
Parameter T= 0 T= 2 T= 4
Model slope (dT) 0.738 1.092 1.387
Variance of slope, sd2 1.295 ×103 1.436 ×103 1.801 ×103
Model mse 0.187 0.121 0.126
Mean of ln(m) 0.773 1.286 1.641
Although estimation is not the main emphasis of the book, we need to illustrate the relationship between estimation with full count sampling and estimation with binomial count sampling. Similar methods can be used for binomial sampling as for complete count sampling. In Section 3.2.1, we demonstrated how to calculate the sample size, n, when a given coefficient of variation, CV, is required:
(7.15) This can be used here, based on the approximate formula in the appendix (7A.4):
(7.16) Writing p=g(µ) and noting the result in Equation 7.8, this can be written in terms of µas
(7.17) Using either Equation 7.16 or 7.17, and substituting the incidence–mean relationship,
n g g
dg d CV
= −
( )
( )(µ ( ))µ µ µ
1
2 2 2
n p p f p
f p CV
=
( )(
1−)
22 2
d d ( )
n= variance per sampling unitCV µ2 2
Binomial Counts 177
Fig. 7.15. The OC (a) and ASN (b) functions for three binomial count SPRT plans used to classify the density with respect to cd=6. The model ln(ln(1 p)
=cT+ dTln(µ) was used to describe the data with T=0 (___), 2 ( … ) or 4 (- – -).
SPRT parameters were µ0=5, µ1=7, α= β =0.1, minn=5 and maxn=50.
Variation about the p–µmodel was included.