Random Patterns and the Poisson Probability Distribution

Classifying Pest Density 3 3

4.3 Random Patterns and the Poisson Probability Distribution

units (Fig. 4.1b). The spatial pattern of numbers in sample units can be displayed graphically as dots whose sizes represents the numbers of individual organisms in the sample units (Fig. 4.1c). The frequency distribution of these numbers can be calculated as in Chapter 2 (Fig 4.1d). What kind of frequency distribution should one expect when points are spread out randomly like this?

Statistical theory shows that, when events occur at random locations in a region, the number of events which can be found in any specified area within the region follows a Poisson probability distribution. This implies that we can predict the shape of the frequency distribution if we know the rate at which the events occur or, equivalently, if we know the average number of events in a unit of area.

This average number of events constitutes the only parameter of the Poisson distribution. It is called the mean and is denoted by the Greek symbol _µ. The Poisson probabilities of x= 0, 1, 2, … events per sample unit are defined as follows:

Fig. 4.1. (a) A random spatial pattern of points. (b) The random pattern in (a) with a 20

×20 grid superimposed. (c) A spatial pattern derived from (b) as a summary of (a).

(d) Frequencies derived from (b) (bars) and fitted Poisson frequencies ().

(4.3) where x! denotes the product of the first xintegers:

x! = 1 ×2 ×3 ×… ×x

The notation p(x|_µ) is a common way of writing formulae for probabilities. It should be understood as the probability of getting x individual organisms in a sample unit when the parameter is equal to _µ. The vertical bar is a convenient way of separating the data (x) from the parameter (_µ). Typical shapes of the Poisson distribution are shown in Fig. 4.2.

On the basis of Equations 4.2 and 4.3, we can compare the frequency distribution obtained by our simulated random process with the expected frequencies for a Poisson distribution. The parameter _µis 1.25 because, in Fig. 4.1, 500 points were placed in a field with n= 400 sample units. With _µ= 1.25, the expected frequencies can be calculated from Equation 4.3:

(4.4) A comparison between the simulated results and theory is shown in Fig. 4.1d. The process of finding the theoretical model which is closest to the data is called model fitting. Here we have fitted the Poisson probability distribution to the observed frequencies. The fitting procedure is quite simple, because we need to calculate only the mean number of individuals per sample unit. This mean is an unbiased estima- tor of _µ. We usually denote it by the arabic equivalent of _µ, namely m.

How does this help us make decisions? From the decision-making perspective, the most important help it gives is that we now know the variance, _σ², of the

E f n x

x x

( )

⁼ ^p

(

^{| .}^{1 25}

)

⁼⁴⁰⁰^e⁻^{1 25}^. ^{1 25}^. _!x

p x e

|^µ !

µ µ

( )

⁼ ⁻

Distributions 65

Fig. 4.2. Poisson distributions with µ= 0.5 (first set of bars), 4 (), 20 (second set of bars).

number of individuals in a sample unit. For a Poisson distribution, the variance is equal to the mean: _σ²= _µ. With this knowledge, we can predict the variance of the mean for samples of any size nas in Chapter 2. Therefore, assuming that we can use the normal distribution approximation as in Chapter 2, we can go as far as calculating probability of decision or operating characteristic (OC) functions.

The probability distribution that we have obtained depends on the size and shape of the sample unit. As a consequence, the OC function relates to the sample unit used in the preliminary work, and cannot be used directly for a different sample unit. For instance, when counting pests on a pair of adjacent plants, rather than on one plant, the mean, _µ, changes to 2_µ. Likewise, the variance, _σ², changes to 2_µ. This new mean and variance would have to be used for calculating the OC function. Therefore, if a Poisson distribution can be assumed for the new sample unit, these few mathematical calculations are all that is required to adjust the OC function.

In the previous section, we used the sample mean to estimate the parameter _µof the Poisson distribution without indicating why this was a good idea. A motivation can be given by the maximum likelihood principle. According to this principle, the overall probability (or ‘likelihood’) of getting the data which were in fact obtained is calculated, and the model parameters for which this ‘likelihood’ is maximized are defined as the maximum likelihood estimates. These estimates have many desirable properties. For example, in most situations which might be encountered by pest managers, these estimates have the highest precision. Other properties are beyond the scope of this book, and are discussed in statistical textbooks. In general, the maximum likelihood principle forms the basis for estimating parameters of probability distributions. As an aside, it is also the principle that leads to least squares estimates in linear regression.

Suppose that we have sample data X₁, X₂, …, X_n, and we want to estimate some kind of probability distribution p(x|_θ) with parameter _θ. Then we should look for the parameter that gives the highest value for the product

L(X₁,X₂,…,X_n|_θ) = p(X₁|_θ) p(X₂|_θ) … p(X_n|_θ) (4.5) This product, L(X₁,X₂,…,X_n|_θ), is called the likelihood function. Its value depends on the parameter (or parameters) _θ and on the sample data. The parameter _θcan be the mean, the variance or some other defining characteristic. For the Poisson distribution, the likelihood is

(4.6)

L X X X e

X X X e

X X X

n n x x x

n nm

1 2

1 2 1 2

1 2

, , , |

! ! ! ! ! !

(

… ^µ

)

⁼ ⁻ ^µ ^µ ⁺ ^+…+_L ⁼ ⁻ ^µ ^µ _L

4.4 Fitting a Distribution to Data Using the Maximum Likelihood

Dalam dokumen SAMPLING AND MONITORING IN CROP PROTECTION The Theoretical Basis for Developing Practical Decision Guides (Halaman 75-78)