4D.4 Probability Distributions for Samples

In Section 4D.2 we introduced two probability distributions commonly encoun- tered when studying populations. The construction of confidence intervals for a normally distributed population was the subject of Section 4D.3. We have yet to ad- dress, however, how we can identify the probability distribution for a given population. In Examples 4.11–4.14 we assumed that the amount of aspirin in analgesic tablets is normally distributed. We are justified in asking how this can be deter- mined without analyzing every member of the population. When we cannot study the whole population, or when we cannot predict the mathematical form of a population’s probability distribution, we must deduce the distribution from a limited sampling of its members.

Sample Distributions and the Central Limit Theorem Let’s return to the problem of determining a penny’s mass to explore the relationship between a population’s distribution and the distribution of samples drawn from that population. The data shown in Tables 4.1 and 4.10 are insufficient for our purpose because they are not large enough to give a useful picture of their respective probability distributions. A better picture of the probability distribution requires a larger sample, such as that shown in Table 4.12, for which –

X is 3.095 and s²is 0.0012.

The data in Table 4.12 are best displayed as a histogram,in which the frequency of occurrence for equal intervals of data is plotted versus the midpoint of each interval. Table 4.13 and Figure 4.8 show a frequency table and histogram for the data in Table 4.12. Note that the histogram was constructed such that the mean value for the data set is centered within its interval. In addition, a normal distribution curve using –

Xand s²to estimate µand σ²is superimposed on the histogram.

It is noteworthy that the histogram in Figure 4.8 approximates the normal distribution curve. Although the histogram for the mass of pennies is not perfectly symmetrical, it is roughly symmetrical about the interval containing the greatest number of pennies. In addition, we know from Table 4.11 that 68.26%, 95.44%, and 99.73% of the members of a normally distributed population are within, respectively, ±1σ, ±2σ,and ±3σ. If we assume that the mean value, 3.095 g, and the sample variance, 0.0012, are good approximations for µand σ², we find that 73%,

µ = 245± 1 96 7 = ±

245 6

( . )( )

mg mg

histogram

A plot showing the number of times an observation occurs as a function of the range of observed values.

78

Modern Analytical Chemistry

Table 4.13

Frequency Distribution for the Data in Table 4.12

Interval Frequency

2.991–3.009 2

3.010–3.028 0

3.029–3.047 4

3.048–3.066 19

3.067–3.085 15

3.086–3.104 23

3.105–3.123 19

3.124–3.142 12

3.143–3.161 13

3.162–3.180 1

3.181–3.199 2

Table 4.12

Individual Masses for a Large Sample of U.S. Pennies in Circulation^a

Penny Weight Penny Weight Penny Weight Penny Weight

(g) (g) (g) (g)

1 3.126 26 3.073 51 3.101 76 3.086

2 3.140 27 3.084 52 3.049 77 3.123

3 3.092 28 3.148 53 3.082 78 3.115

4 3.095 29 3.047 54 3.142 79 3.055

5 3.080 30 3.121 55 3.082 80 3.057

6 3.065 31 3.116 56 3.066 81 3.097

7 3.117 32 3.005 57 3.128 82 3.066

8 3.034 33 3.115 58 3.112 83 3.113

9 3.126 34 3.103 59 3.085 84 3.102

10 3.057 35 3.086 60 3.086 85 3.033

11 3.053 36 3.103 61 3.084 86 3.112

12 3.099 37 3.049 62 3.104 87 3.103

13 3.065 38 2.998 63 3.107 88 3.198

14 3.059 39 3.063 64 3.093 89 3.103

15 3.068 40 3.055 65 3.126 90 3.126

16 3.060 41 3.181 66 3.138 91 3.111

17 3.078 42 3.108 67 3.131 92 3.126

18 3.125 43 3.114 68 3.120 93 3.052

19 3.090 44 3.121 69 3.100 94 3.113

20 3.100 45 3.105 70 3.099 95 3.085

21 3.055 46 3.078 71 3.097 96 3.117

22 3.105 47 3.147 72 3.091 97 3.142

23 3.063 48 3.104 73 3.077 98 3.031

24 3.083 49 3.146 74 3.178 99 3.083

25 3.065 50 3.095 75 3.054 100 3.104

aPennies are identified in the order in which they were sampled and weighed.

Figure 4.8

Histogram for data in Table 4.12. A normal distribution curve for the data, based on X– and s², is superimposed on the histogram.

95%, and 100% of the pennies fall within these limits. It is easy to imagine that in- creasing the number of pennies in the sample will result in a histogram that even more closely approximates a normal distribution.

We will not offer a formal proof that the sample of pennies in Table 4.12 and the population from which they were drawn are normally distributed; however, the evidence we have seen strongly suggests that this is true. Although we cannot claim that the results for all analytical experiments are normally distributed, in most cases the data we collect in the laboratory are, in fact, drawn from a normally distributed population. That this is generally true is a consequence of the central limit theorem.⁶ According to this theorem, in systems subject to a variety of indeterminate errors, the distribution of results will be approximately normal. Furthermore, as the number of contributing sources of indeterminate error increases, the results come even closer to approximating a normal distribution. The central limit theorem holds true even if the individual sources of indeterminate error are not normally distributed. The chief limitation to the central limit theorem is that the sources of indeterminate error must be independent and of similar magnitude so that no one source of error dominates the final distribution.

Estimating µand σ² Our comparison of the histogram for the data in Table 4.12 to a normal distribution assumes that the sample’s mean, –

X, and variance, s², are appropriate estimators of the population’s mean, µ, and variance, σ². Why did we select –

X and s², as opposed to other possible measures of central tendency and spread? The explanation is simple; –

Xand s²are considered unbiased estimators of µ and σ².^7,8If we could analyze every possible sample of equal size for a given population (e.g., every possible sample of five pennies), calculating their respective means and variances, the average mean and the average variance would equal µand σ². Al- though –

Xand s²for any single sample probably will not be the same as µor σ², they provide a reasonable estimate for these values.

Chapter 4 Evaluating Analytical Data

79

Frequency

Weight of pennies (g)

3.181 to 3.199 3.162

to 3.180 3.048

to 3.066 3.010

to 3.028

3.029 to 3.047

3.086 to 3.104

3.105 to 3.123

3.124 to 3.142

3.143 to 3.161 2.991

to 3.009 0 5 10 15 20 25

3.067 to 3.085

central limit theorem

The distribution of measurements subject to indeterminate errors is often a normal distribution.

80

Modern Analytical Chemistry

degrees of freedom

The number of independent values on which a result is based (ν).

Degrees of Freedom Unlike the population’s variance, the variance of a sample in- cludes the term n– 1 in the denominator, where nis the size of the sample

4.12 Defining the sample’s variance with a denominator of n,as in the case of the population’s variance leads to a biased estimation of σ². The denominators of the variance equations 4.8 and 4.12 are commonly called the degrees of freedom for the population and the sample, respectively. In the case of a population, the degrees of freedom is always equal to the total number of members, n,in the population. For the sample’s variance, however, substituting –

X for µ removes a degree of freedom from the calculation. That is, if there are nmembers in the sample, the value of the n^thmember can always be deduced from the remaining n– 1 members and –

X. For example, if we have a sample with five members, and we know that four of the members are 1, 2, 3, and 4, and that the mean is 3, then the fifth member of the sample must be

(–

X×n) –X1–X2–X3–X4= (3×5) – 1 – 2 – 3 – 4 = 5

Dalam dokumen Buku Modern Analytical Chemistry (Halaman 93-96)