In Section 4D.2 we introduced two probability distributions commonly encoun- tered when studying populations. The construction of confidence intervals for a normally distributed population was the subject of Section 4D.3. We have yet to ad- dress, however, how we can identify the probability distribution for a given popula- tion. In Examples 4.11–4.14 we assumed that the amount of aspirin in analgesic tablets is normally distributed. We are justified in asking how this can be deter- mined without analyzing every member of the population. When we cannot study the whole population, or when we cannot predict the mathematical form of a popu- lation’s probability distribution, we must deduce the distribution from a limited sampling of its members.
Sample Distributions and the Central Limit Theorem Let’s return to the problem of determining a penny’s mass to explore the relationship between a population’s distribution and the distribution of samples drawn from that population. The data shown in Tables 4.1 and 4.10 are insufficient for our purpose because they are not large enough to give a useful picture of their respective probability distributions. A better picture of the probability distribution requires a larger sample, such as that shown in Table 4.12, for which –
X is 3.095 and s2is 0.0012.
The data in Table 4.12 are best displayed as a histogram,in which the fre- quency of occurrence for equal intervals of data is plotted versus the midpoint of each interval. Table 4.13 and Figure 4.8 show a frequency table and histogram for the data in Table 4.12. Note that the histogram was constructed such that the mean value for the data set is centered within its interval. In addition, a normal distribu- tion curve using –
Xand s2to estimate µand σ2is superimposed on the histogram.
It is noteworthy that the histogram in Figure 4.8 approximates the normal dis- tribution curve. Although the histogram for the mass of pennies is not perfectly symmetrical, it is roughly symmetrical about the interval containing the greatest number of pennies. In addition, we know from Table 4.11 that 68.26%, 95.44%, and 99.73% of the members of a normally distributed population are within, re- spectively, ±1σ, ±2σ,and ±3σ. If we assume that the mean value, 3.095 g, and the sample variance, 0.0012, are good approximations for µand σ2, we find that 73%,
µ = 245± 1 96 7 = ±
5
245 6
( . )( )
mg mg
histogram
A plot showing the number of times an observation occurs as a function of the range of observed values.
78
Modern Analytical ChemistryTable 4.13
Frequency Distribution for the Data in Table 4.12Interval Frequency
2.991–3.009 2
3.010–3.028 0
3.029–3.047 4
3.048–3.066 19
3.067–3.085 15
3.086–3.104 23
3.105–3.123 19
3.124–3.142 12
3.143–3.161 13
3.162–3.180 1
3.181–3.199 2
Table 4.12
Individual Masses for a Large Sample of U.S. Pennies in CirculationaPenny Weight Penny Weight Penny Weight Penny Weight
(g) (g) (g) (g)
1 3.126 26 3.073 51 3.101 76 3.086
2 3.140 27 3.084 52 3.049 77 3.123
3 3.092 28 3.148 53 3.082 78 3.115
4 3.095 29 3.047 54 3.142 79 3.055
5 3.080 30 3.121 55 3.082 80 3.057
6 3.065 31 3.116 56 3.066 81 3.097
7 3.117 32 3.005 57 3.128 82 3.066
8 3.034 33 3.115 58 3.112 83 3.113
9 3.126 34 3.103 59 3.085 84 3.102
10 3.057 35 3.086 60 3.086 85 3.033
11 3.053 36 3.103 61 3.084 86 3.112
12 3.099 37 3.049 62 3.104 87 3.103
13 3.065 38 2.998 63 3.107 88 3.198
14 3.059 39 3.063 64 3.093 89 3.103
15 3.068 40 3.055 65 3.126 90 3.126
16 3.060 41 3.181 66 3.138 91 3.111
17 3.078 42 3.108 67 3.131 92 3.126
18 3.125 43 3.114 68 3.120 93 3.052
19 3.090 44 3.121 69 3.100 94 3.113
20 3.100 45 3.105 70 3.099 95 3.085
21 3.055 46 3.078 71 3.097 96 3.117
22 3.105 47 3.147 72 3.091 97 3.142
23 3.063 48 3.104 73 3.077 98 3.031
24 3.083 49 3.146 74 3.178 99 3.083
25 3.065 50 3.095 75 3.054 100 3.104
aPennies are identified in the order in which they were sampled and weighed.
Figure 4.8
Histogram for data in Table 4.12. A normal distribution curve for the data, based on X– and s2, is superimposed on the histogram.
95%, and 100% of the pennies fall within these limits. It is easy to imagine that in- creasing the number of pennies in the sample will result in a histogram that even more closely approximates a normal distribution.
We will not offer a formal proof that the sample of pennies in Table 4.12 and the population from which they were drawn are normally distributed; how- ever, the evidence we have seen strongly suggests that this is true. Although we cannot claim that the results for all analytical experiments are normally distrib- uted, in most cases the data we collect in the laboratory are, in fact, drawn from a normally distributed population. That this is generally true is a consequence of the central limit theorem.6 According to this theorem, in systems subject to a variety of indeterminate errors, the distribution of results will be approximately normal. Furthermore, as the number of contributing sources of indeterminate error increases, the results come even closer to approximating a normal distribu- tion. The central limit theorem holds true even if the individual sources of in- determinate error are not normally distributed. The chief limitation to the central limit theorem is that the sources of indeterminate error must be indepen- dent and of similar magnitude so that no one source of error dominates the final distribution.
Estimating µand σ2 Our comparison of the histogram for the data in Table 4.12 to a normal distribution assumes that the sample’s mean, –
X, and variance, s2, are appropriate estimators of the population’s mean, µ, and variance, σ2. Why did we select –
X and s2, as opposed to other possible measures of central tendency and spread? The explanation is simple; –
Xand s2are considered unbiased estimators of µ and σ2.7,8If we could analyze every possible sample of equal size for a given popula- tion (e.g., every possible sample of five pennies), calculating their respective means and variances, the average mean and the average variance would equal µand σ2. Al- though –
Xand s2for any single sample probably will not be the same as µor σ2, they provide a reasonable estimate for these values.
Chapter 4 Evaluating Analytical Data
79
Frequency
Weight of pennies (g)
3.181 to 3.199 3.162
to 3.180 3.048
to 3.066 3.010
to 3.028
3.029 to 3.047
3.086 to 3.104
3.105 to 3.123
3.124 to 3.142
3.143 to 3.161 2.991
to 3.009 0 5 10 15 20 25
3.067 to 3.085
central limit theorem
The distribution of measurements subject to indeterminate errors is often a normal distribution.
80
Modern Analytical Chemistrydegrees of freedom
The number of independent values on which a result is based (ν).
Degrees of Freedom Unlike the population’s variance, the variance of a sample in- cludes the term n– 1 in the denominator, where nis the size of the sample
4.12 Defining the sample’s variance with a denominator of n,as in the case of the popu- lation’s variance leads to a biased estimation of σ2. The denominators of the vari- ance equations 4.8 and 4.12 are commonly called the degrees of freedom for the population and the sample, respectively. In the case of a population, the degrees of freedom is always equal to the total number of members, n,in the population. For the sample’s variance, however, substituting –
X for µ removes a degree of freedom from the calculation. That is, if there are nmembers in the sample, the value of the nthmember can always be deduced from the remaining n– 1 members and –
X. For example, if we have a sample with five members, and we know that four of the members are 1, 2, 3, and 4, and that the mean is 3, then the fifth member of the sample must be
(–
X×n) –X1–X2–X3–X4= (3×5) – 1 – 2 – 3 – 4 = 5