• Tidak ada hasil yang ditemukan

The Distribution of Sample Means – the Central Limit Theorem

Basic Concepts of Sampling

2.9 The Distribution of Sample Means – the Central Limit Theorem

The properties of the normal distribution have been studied and many of them described in detail; they can be found in statistical texts. In particular, the cumula- tive distribution functionof the normal distribution has been tabulated. The cumula- tive distribution function defines the probability that a random variable takes on values less than or equal to a specified constant. We can find in published tables (available in many textbooks) the theoretical probability of a sample mean being, say, 1.338 standard deviations away from the true mean – or any other number of standard deviations. The only thing that we need to be careful about is that the sample size is large enough for us to assume the normal distribution. There is much folklore on this subject, and no globally true criterion exists. However, most workers in pest management are prepared to assume a normal distribution if their sample size is 25–30 or more.

Table 2.2. Notation for various statistical quantities based on the mean and the variance. There is some inconsistency in the literature (for example, sand are often referred to as standard deviations). In this book we use the definitions noted here.

True values

µ Mean

σ2 Variance

σ Standard deviation (sd)

Standard deviation of a sample mean (sdm)

Coefficient of variation

Coefficient of variation of a sample mean

Sample estimates

m Sample mean

V Sample variance, often written as s2

Sample sd, also called standard error (se) Standard error of a sample mean (sem)

Estimated coefficient of variation

Estimated coefficient of variation of a sample mean V n

m

s n m

/ or /

V m

s or m V

n

s or n V or s σ

µ / n σ µ σ

n

s/ n

Basic Concepts of Sampling for Pest Management 31

Fig. 2.4. A typical normal distribution, with mean equal to 5 and standard deviation equal to 1. The range from 4 to 6 contains about two-thirds of the distribution, and the range from 3 to 7 contains about 95% of the distribution.

Exhibit 2.1. Variance of sample means and the Central Limit Theorem

Sampling was simulated on the computer by randomly selecting sample observa- tions from Beall’s record of Colorado potato beetle counts. Three sample sizes were used: 5, 25 and 50 sample units. For each sample size, the sampling process was simulated 500 times, thereby resulting in 500 estimates of the mean. The variance, σ2, of the original counts was 14.995 and the mean,µ, was 4.74. The mean and variance of the 500 simulated sample means were calculated and graphed as func- tions of the sample size. The theoretical variance of the sample (Equation 2.2) was also calculated and graphed as a function of the sample size. The results are shown in Figs 2.5 and 2.6.

The squares in Fig. 2.5 represent the mean and variance of the original counts, while the circles are the average of the sample means (the mean of the means) and variances of the sample means. The means are essentially the same for the original counts and for all three sample sizes. The variances of the sample means decrease with increasing sample size and closely follow the theoretical variance (solid line).

In Fig. 2.6, the 500 estimates of the sample means are arranged as frequency distributions and are compared to frequencies based on the normal distribution (lines). Frequencies are shown for an interval rather than a single value of the mean (e.g. 0.2–0.4 versus 0.3), because the estimated means are continuous; and in order to calculate a frequency, the number of means in an interval must be tal- lied. As the sample size increases, the distribution of means is more like a normal distribution.

Continued Therefore we have two special reasons to be grateful to statistical theory:

1. For the formula which tells us how the variance is reduced when the sample size increases.

2. For defining the shape of the distribution of sample means when the sample size is large.

These are illustrated in Exhibit 2.1.

Fig. 2.6. Frequency distributions of 500 sample means for sample sizes of (a) n= 5, (b) n= 25 and (c) n= 50. Solid lines are theoretical frequencies based on the normal distribution.

15

10

5

Variance

(b)

0 20 40 60 Sample size

0 20 40 60 Sample size

4

2

Mean

(a)

Fig. 2.5. The (a) mean and (b) variance of the original count of Colorado potato beetles (▫) and of means and variances of means () for sample sizes 5, 25 and 50.

The solid line in (b) is the theoretical variance of the sample mean, according to Equation 2.3.

The purpose of sampling in pest management is to gather information on pest abundance so that a decision can be made on the need for some control action.

Using the results discussed above, we can make inferences based on sample data which allow us to make informed decisions. For example, suppose that, following the principles laid out in this and the preceding chapter, we have decided that the critical density (cd) for Colorado potato beetles in fields such as Beall’s, is 3.5 beetles per sample unit, and that 25 sample units are sufficient. In practice, this would mean that the management protocol is to inspect 25 sample units, calculate the average number of beetles per unit, and recommend a control action if the sample mean is greater than 3.5. Typical values for the sample mean (obtained by 1000 simulations) are displayed in Fig. 2.7, along with an indication of cd.

By looking carefully at Fig. 2.7a, we can count that about 30 of the 1000 sample means were less than 3.5, equal to a proportion 0.030, or 3%. It is tedious and hard on the eyes to do this for each sample protocol, but a good approximation is available based on the central limit theorem. For this, it is most convenient to transform Fig. 2.7a into Fig. 2.7b, by changing the x-axis from simple average counts of beetles per sample unit to their standardized form:

(2.4) The transformed value, z, of the mean, m, is normally distributed (approxi- mately), and has a mean value equal to zero and a variance equal to one. It is much easier to use published tables of the normal distribution using zthan using m, but to do that, we need to transform cdin the same way:

z m

sd sd

n

m

= −µ, where m = σ

Basic Concepts of Sampling for Pest Management 33