Usefulness of Sampling Plans 6 6
7.5 Binomial Sampling Based on the Negative Binomial Distribution
We can make formula (Equation 7.9) a little more palatable by noting that dg/dµis the slope of the p–µrelationship, p= g(µ). Equation 7.9 states that the variance of mbindecreases as this slope increases. Because changing Tchanges this slope, it also changes the variance. These relationships are illustrated in Fig. 7.6a.
When µ=5, the slope, dg/dµ, is maximum when T=4 or T=5 (Fig. 7.6b). The slope is often greatest when Tis near the true mean.
As we noted in Chapter 4, the Poisson distribution is rarely satisfactory for counts of arthropods or diseases, because the spatial pattern of counts is usually aggregated.
One distribution that can often be used to describe sample counts is the negative binomial. Of course, there are instances in which no probability distribution is sat- isfactory, and we discuss later in this chapter how to deal with that. Now, we focus on the situation in which the negative binomial distribution provides a good description of the distribution of sample counts.
Unlike the Poisson distribution, the negative binomial distribution has two parameters, µand k, and this extra parameter, k, adds further difficulties for bino- mial sampling. It is not practical in sampling for decision-making to estimate both µand k, so we must choose a value of kindependently of the sample data.
In full count sampling, not knowing a precise value for kinfluences the precision of the estimate of µbut not the bias: the bias is zero whatever value is chosen for k. We observed this in Chapter 5, where the OC functions were all centred on cd.
Binomial Counts 165
Fig. 7.6. The relationship between p and µ(mean) for the Poisson distribution (a) when T=0 (___), 1 ( … ), 3 (- – -) and 5 (– . – . –) and the slope (b) of the p–µmodel for µ =5 as a function of T.
With binomial sampling, however, incomplete knowledge of kcan result in bias. This occurs because the negative binomial distribution can take many forms, depending on k. In particular, the probability of zero incidence, Prob(count =0) can correspond to a large number of mean pest densities, µ, each of which corre- sponds to a different value of k. If Prob(count =0) is moderately large, µmust be small, and does not change much for different values of k. However, if Prob(count
= 0) is small, µ must be large, and its estimated value depends greatly on the assumed value for k (Table 7.2). This is presented graphically in Fig. 7.7: for Prob(count =0) =0.6, the possible negative binomial distributions that have the same p, but different µand kare all close together, but for Prob(count =0) =0.2, they are greatly different.
Independently of the effect of k, there remain similar effects of Tand non the variance and bias as with the Poisson distribution. Formulae derived from Equations 7A.4 and 7A.6 in the Appendix summarize these effects. Increasing n decreases the bias and the variance, making the OC steeper. Increasing T to around the critical density also reduces bias, including the effect of uncertainty in
Table 7.2. Mean values, µ, for negative binomial distributions corresponding to fixed probabilities of zero incidence and a range of values of k.
k= 0.5 k= 1 k= 2
Prob(count =0) =0.2 12 4 2.5
Prob(count =0) =0.4 2.6 1.5 1.2
Prob(count =0) =0.6 0.89 0.67 0.58
Prob(count =0) =0.8 0.28 0.25 0.24
Fig. 7.7. Negative binomial distributions with common values for Prob(count =0) but unequal values of k(k=0.5 (+–+), k=1 (…), k=2(o-o-o)). (a) Prob(count =0) =0.6 (µ =0.9, 0.7, 0.6); (b) Prob(count =0) =0.2 (µ =12, 4, 2.5).
k, and makes the OC steeper (Binns and Bostanian, 1990a). An OC obtained when sampling a population with k less than that used to formulate the stop boundaries will lie to the right of the nominal OC and will move towards the nominal OC (to the left) as T is increased. However, with continued increases in Tto values greater than the critical density, the OC for the populations having k less than the nominal value will eventually cross the nominal OC and lie to the left of it (Fig. 7.8).
When formulating binomial count classification sample plans based on the negative binomial distribution, it is essential that the effect of uncertainty in k be evaluated and perhaps ameliorated by manipulating Tand the sample size. These points are illustrated in Exhibit 7.2.
Binomial Counts 167
Exhibit 7.2. Binomial classification sampling with the negative binomial distribution In this example, we illustrate the influence of kon the OC functions for binomial count classification sample plans based on the negative binomial distribution, and how an appropriate choice of Tcan reduce bias due to not knowing the true value for k. The example is based on sampling European red mite (Panonychus ulmi), a small plant-feeding mite which is a common pest in commercial apple orchards.
These arthropods damage leaves by inserting their mouthparts into cells to remove fluids. In doing so, they reduce the leaf’s ability to photosynthesize. Severe red mite injury can lead to reduced crop yield and quality.
Nyrop and Binns (1992) found that the sampling distribution of counts of European red mite on leaves could frequently be described by a negative binomial distribution. Although ktended to increase with increasing mite density, there was considerable variability in kat any particular mite density. European red mites are small and difficult to count. As such, they are good candidates for binomial count sample plans. The potential for damage by the European red mite depends on the time of year, the crop load and the geographical location, so the critical density is not constant. One common working critical density is five mites per leaf.
Continued Fig. 7.8. Binomial sampling based on the negative binomial distribution with cd=2 and n=30: the effect of Ton OC when kis not known precisely. The assumed value for kis 1; true kis equal to 0.5 (___), 1 ( … ), 2 (- – -). (a) T =0; (b) T =3; (c) T =8.
Nyrop and Binns (1992) reasoned that because kincreased with µ, variability in kshould be assessed for a restricted range of densities around each critical den- sity and not over the entire range of densities observed. For cd=5, they found that the median value for kwas 0.8, with 10th and 90th percentiles equal to 0.4 and 1.5 respectively.
Initial comparisons were made among three sample plans: full count fixed sample size (n= 50), binomial count fixed sample size (T= 0, k= 0.8, n= 50), binomial count SPRT (T=0, k=0.8, µ1=6, µ0=4, α= β =0.1, minn=5 and maxn
=50). The results are shown in Fig. 7.9. The sequential and fixed sample size bino- mial count plans had nearly identical OC functions that were flatter than the OC for the complete enumeration plan. This reflects the greater variability inherent in bino- mial counts. The sequential plan resulted in some savings over the fixed sample size plan for small mean densities. However, the p–µrelationship for T=0 is quite flat for densities greater than 5 (Fig. 7.6), so the variance is high (Equation 7.9). In turn, this means that the sampling plan requires close to the maximum number of sample units – the ASN function for means greater than 5 is relatively flat.
The OC functions for these binomial count sample plans can be made to look more like the OC for the full count plan by increasing the sample size. However, the effect of imperfect knowledge about k is more important. A binomial count Sequential Probability Ratio Test (SPRT) plan (T=0, k=0.8, µ1=6, µ0=4, α= β = 0.1, minn=5 and maxn=50) was set up and tested on negative binomial distribu-
Fig. 7.9. OC (a) and ASN (b) functions for three sampling plans used to classify the density with respect to cd=5. Sample counts were described by a negative binomial distribution (k=0.8). Plan 1 (___) used a fixed sample size (n=50) and complete counts. Plan 2 ( … ) used binomial counts with T=0 and n=50. The third plan (- – -) used binomial counts, was sequential and based on the SPRT, with the parameters T=0, µ0=4, µ1=6, α= β =0.1, minn=5 and maxn=50.
Binomial Counts 169
tions with different values of k: k = 0.8 (corresponding to the parameters of the sample plan), k=0.4 and k= 1.5 (representing possible lower and upper bounds for k). The effect of these values on the OC functions is extreme: the OC for k=1.5 is shifted far to the left of the nominal OC and the OC for k=0.4 is shifted far to the right (Fig. 7.10).
By increasing T, the differences among the three OC functions depicted in Fig.
7.10 can be greatly reduced, and in many instances effectively eliminated. This is shown in Fig. 7.11, where the sample plan parameters are identical to those described in the above paragraph, but with T equal to 3, 5, 8 and 11. The use of T= 7 would probably minimize bias due to imperfect knowledge about k. However, it would also make scoring samples much more time-consuming than using T=0. It is necessary to estimate OC and ASN functions for different scenarios, so that the properties of any given plan can be assessed against user needs and expectations.
For example, plans with T=7 have been used successfully by growers and exten- sion personnel in Quebec for several years.
Continued Fig. 7.10. The OC (a) and ASN (b) functions for a binomial count SPRT plan used to classify the density with respect to cd=5. Sample counts were described by a negative binomial distribution. Parameters for the sampling plan was T=0, k=0.8, µ0=4, µ1=6, α= β =0.1, minn=5 and maxn=50. The parameterkof the negative binomial distribution used to describe the populations sampled was 0.8 (___), 0.4 ( … ), and 1.5 (- – -).
Up to this point, we have assumed that while kmay not remain constant, it does not change systematically with density. However, as noted in previous chapters, k often tends to increase with density and can be modelled using a variance–mean relationship. If a variance–mean relationship, such as Taylor’s Power Law (TPL) (Equation 3.14) holds for the pest in question, and the negative binomial distribu- tion describes sample counts, the value of kcan be regarded as a function of µ:
Fig. 7.11. The OC functions for binomial count SPRT plans with T=3(a), 5(b), 8(c) and 11(d) used to classify the density with respect to cd=5. Sample counts were described by a negative binomial distribution. Parameters for all sampling plans were k=0.8, µ0=4, µ1=6, α= β =0.1, minn=5 and maxn=50. The parameter kof the negative binomial distribution used to describe the populations sampled was 0.8 (___), 0.4 ( … ), and 1.5 (- – -).