Classifying Pest Density 3 3
4.5 Testing the Goodness of Fit of a Probability Distribution
0 2 4 6 8 10 200
100
100
200
Log likelihood
True mean density
Fig. 4.3. A maximum likelihood example based on the Poisson distribution. The curved line represents the log-likelihood for sample size n= 25, as a function of µ. The maximum is at µ= m, which here is equal to 5.
guide to one of them, the χ2test. In the simplest case, based on nsample units, the parameters of the probability distribution model are determined (usually by maxi- mum likelihood), and the expected frequencies are calculated as np(xi|θ^), where i= 1, 2, … , cand xi= i1. Note that here we are dealing with frequency classes xi, whereas in Equations 4.5 and 4.6, and all of Section 4.3, Xj refers to the jth sample unit, and j = 1, 2, … , n. The np(xi|θ^) are used in a comparison with the observed frequencies, fi, to calculate a statistic X2:
(4.8)
where cis the number of frequency classes. Note that the capital letter, X, is used to distinguish the expression in Equation 4.8 from the theoretical χ2distribution. If the data truly come from the distribution p(x|θ), with θ= θ^, X2 follows what is called the χ2distribution (with a few provisos, as discussed below), whose proper- ties are well documented. The χ2 distribution has one parameter, its degrees of freedom. The number of degrees of freedom, df, is easily calculated as c 1 np, where cis the number of classes and npis the number of fitted parameters (np= 1 for the Poisson distribution). If the data do not come from the distribution p(x|θ), or, for some reason or other, θis not equal to θ^, X2tends to be large, because it is basi- cally the weighted sum of squared differences between observed (fi) and expected (np(xi|θ^)) frequencies. Therefore, large values of X2indicate that the fit to the dis- tribution p(x|θ) is not good.
How large is large? Because we know the properties of the χ2distribution, we can find the probability of getting a χ2value equal to or greater than the calculated value of X2. Suppose that this probability is 0.2. What this means is as follows:
• either the data come from the distribution p(x|θ) with θ= θ^, but an event with probability of occurrence equal to 0.2 occurred, or
• the data did not come from the distribution p(x|θ) with θ= θ^
Our choice depends on how we view the probability 0.2 in connection with our preconceived notion that the data might conform to the distribution p(x|θ). If this preconception is strong, then why not accept that we have just been unlucky and got a result which should happen only one time in five – that’s not too unlikely. But what if the probability had been 0.02? Is our preconception so strong that a 1 in 50 chance is just unlucky? Or should we reject the possibility that the data conform to the distribution p(x|θ)? It has become standard usage that a prob- ability equal to 0.05 is a borderline between: (i) accepting that we have just been unlucky and (ii) that we should have grave doubts about our preconceived notion.
The above procedure is an example of testing a hypothesis. Here the hypothesis is that the data conform to the distribution p(x|θ). The data are used to estimate para- meters, X2is calculated and a χ2probability (P) is found. The hypothesis is accepted if P is greater than some standard, often 0.05, and rejected otherwise. If there are many data sets, the proportion of data sets where the hypothesis is rejected can be compared with the standard probability that was used. If all the data sets really do conform to the distribution p(x|θ), this proportion should be close to the standard
X2
2
1
=
(
−( ) )
∑
fi n xn x( )
ii
c p |
p |
ˆ ˆ
θ θ
probability itself. Testing statistical hypotheses in this way has become common in many branches of biological science, and some technical terms should be noted. X2 is an example of a ‘test statistic’, and the χ2probability obtained from it is called a
‘significance probability’. If the significance probability, P, is less than 0.05, the hypothesis is said to be rejected at the 5% level. This test is often called the χ2 goodness-of-fit test.
Use of the χ2test implies that the observed frequencies can be regarded as fol- lowing a probability model, with the class probabilities given by p(xi|θ^). However, a few conditions must be fulfilled before the results can be analysed by the χ2test:
1. None of the expected frequencies, np(xi|θ^), should be small. Traditionally, 5 has been regarded as the minimum value, but some statisticians think that 1 is often large enough. If the tail class frequencies are too small, the classes should be grouped until all classes have acceptable expected frequencies. Some arbitrariness is unavoidable here. Note that if there is grouping, the number of degrees of free- dom, df, must be recalculated, because the number of classes, c, has changed.3 2. The number of sample units, n, should not be small. This requirement is con- nected with the first because if nis small many of the expected frequencies will be small, and too much grouping is not good. Another reason for nto be large (gener- ally, at least 100) is that the estimated parameters and the significance test will be more reliable.
These concepts are illustrated in Exhibit 4.1.
Distributions 69
3There are some theoretical difficulties when classes are grouped relating to degrees of freedom (Chernoff and Lehmann, 1954), but in the present context they should be ignorable.
Exhibit 4.1. Sampling a random pattern produces a Poisson distribution
A random spatial pattern can be generated as in Fig. 4.1a. Sample units can be defined by superimposing regular grid lines, as in Fig. 4.1b. Sample units of different shapes and sizes can be defined by using different grids. Four types of sample unit were cre- ated for the random spatial pattern of 500 points shown in Fig. 4.1a (see Fig. 4.4):
• 25 sample units in a 5 ×5 array (five sample units along each of the horizontal and vertical axes)
• 100 sample units in a 10 ×10 array
• 400 sample units in a 20 ×20 array
• 100 sample units in a 20 ×5 array.
The means and variances were as follows:
Number, nx, of sample units along the x-axis 5 10 20 20 Number, ny, of sample units along the y-axis 5 10 20 5
Grid (nx×ny) 5 ×5 10 ×10 20 ×20 20 ×5
Number of sample units (n= nx×ny) 25 100 400 100
Mean per sample unit 20 5 1.25 5
Calculated variance per sample unit 29.1 5.88 1.41 6.08 Continued
Although not exactly equal to the means, the variances were close to the means for all shapes and sizes of sample unit. The frequency distributions were cal- culated for each sample unit, and the Poisson distribution was fitted to each by maximum likelihood (Fig. 4.5). As noted above, the maximum likelihood estimate of µfor the Poisson distribution is always equal to the mean, m. The χ2significance probabilities, P, were calculated using the grouping criterion that the expected fre- quencies for the tail classes should be at least equal to 1:
Grid (nx×ny) 5 ×5 10 ×10 20 ×20 20 ×5
X2 19.9 7.8 6.4 9.8
df = c1 np(np= 1) 14 9 4 9
Significance probability, P 0.13 0.55 0.17 0.36
Fig. 4.4. A random spatial pattern of 500 points, with superimposed grids of sample units. (a) 5 ×5 grid, mean = 20, variance = 29.1; (b) 10 ×10 grid, mean = 5, variance = 5.88; (c) 20 ×20 grid, mean = 1.25, variance = 1.41;
(d) 5 ×20 grid, mean = 5, variance = 6.08.
We know from experience that pests are often not distributed randomly. They tend to be found in aggregated spatial patterns. There are many biological mechanisms that lead to aggregation: the existence of ‘good’ and ‘bad’ resource patches, settling of offspring close to the parent, mate finding and so on. Likewise, there are many ways in which models for such processes can be combined to obtain probability dis- tributions to account for the effect of spatial aggregation.
Distributions 71
Because these are all greater than the standard value, P= 0.05, we would be justi- fied in assuming a Poisson distribution in each case.
Despite the non-significant P-value for the 5 ×5 grid, Fig. 4.5a is not convinc- ing: the frequencies do not really look like the theoretical values. This is mainly because n= 25 is too few sample units for a good test. There are people who are reluctant to accept the results of a statistical significance test if the data do not look convincing. It is unwise to lean too heavily on a statistical crutch. Statistical tests are most useful when they summarize something that is not difficult to swallow.
Fig. 4.5. Observed frequency distributions for the four grid types shown in Fig.
4.4, with a Poisson distribution fitted to each () with grouping criterion equal to one. (a) 5 ×5; (b) 10 ×10; (c) 20 ×20; (d) 5 ×20.