• Tidak ada hasil yang ditemukan

Measures of Variability

4.4 The Variance

Another way to remove the negative signs when summing deviation scores is to square them, since a negative number multiplied by another negative number yields a positive number. This minor change defines the difference between the concept of a mean deviation and what is called the variance of a distribution. We can define thevarianceas the average squared deviation score; it is symbolized asσ2(pronounced“sigma squared”;σis the Greek lower case ofΣ) for popu- lation variances ands2for sample variances. Please note that the formula for the variance of apopulationis slightly different from the formula for asample variance.

Population variance,σ2 σ2=ΣXμ2

N (Formula 4.6)

where

σ2= the symbol for the population variance X =a raw score

μ= the population mean

N= the number of scores in the population

1 The main problem with the mean deviation has to do with estimating the variability of a population from a sample of scores. The mean deviation of a sample does not bear a consistent relation to the mean deviation of the population from which the sample was drawn. Since much of the field of statistics involves using the characteristics of samples to infer the characteristics of populations, the mean deviation is rarely used.

102 4 Measures of Variability

Sample variance s2=ΣXM2

n1 (Formula 4.7)

where

s2= symbol for sample variance M= the sample mean

n= the number of scores in the distribution

The Sample Variance as an Unbiased Estimate of the Population Variance

Recall that a sample is a subset of scores drawn from a population. Researchers are always interested in the characteristics of a population; samples are often used to make inferences about a population. Suppose we want to know the mean of a population, but a sample of scores from that population is all that is available. The best estimate of the mean of the population is the mean of the sample. Usually our estimate will be off; rarely is the sample mean identical to the population mean. Sometimes the sample mean will be larger than the population mean, and sometimes it will be smaller. It is important to note that the sample mean is just as likely to be smaller as it is to be larger than the actual population mean. Since both types of errors are equally likely, the sample mean is said to provide anunbiasedestimate of the population mean. It would be said to be biased if one type of error was more likely than the other. Also, please note that the degree to which the sample mean is off of the population mean decreases as the size of the sample increases. Larger sample sizes yield more accurate estimates of population parameters. This observation will prove to be very useful later on in the text.

In comparing the formulas for the variance of a population and a sample, note that the denominator of the sample variance isn–1, instead ofN. (Compare Formulas 4.6 and 4.7). This difference is a correction factor designed to adjust a bias that occurs when using sample variances to estimate population var- iances. To show this, suppose we take 100 samples from a population (always replacing the scores from a drawn sample before taking the next sample) and compute the variance of each sample, but using the population formula, without the correction factor in the numerator (Formula 4.6). Assume that we know the true population variance. What we would discover is that, of the 100 computed variances, most would be smaller than the true population variance, and only a few would be larger. If we were to apply thepopulationformula for variance to a single sample of scores, and then use that value as an estimate of the population variance, we would most likely underestimate the size of the population

variance. Dividing byn–1 provides a correction so that the formula for the sample variance becomes an unbiased estimate of the population variance–just as likely to overestimate as it is to underestimate.

Figure 4.3 gives a visual description of why then–1 correction factor is nec- essary when estimating the variance of a population. In this figure, the scores of the population assume a normal distribution. Since most of the population scores are found in the middle of the population, a random sample of, for instance, eight scores would likely come from the middle of the population dis- tribution. As a result, the spread of sample scores is not as spread out as the spread of population scores. For this reason, the variance of a sample will tend to underestimate the variance of a population unless corrected. Placingn–1 in the denominator of the sample variance formula effectively increases the value of the sample variance, providing for a much less biased estimate of the popu- lation variance. The sample size also matters. As the size of the sample increases, the sample variance better approximates the population variance. (Recall that this relationship between sample size and statistical accuracy was also true for the mean.)

QuestionWhat is the variance of this sample of scores?

3, 4, 6, 8, 9

X X X X X X X X

Sample scores

Sample variability Population variability

Population distribution

Figure 4.3 The variability of a sample of scores will tend to be less than the variability of the population from which the scores are taken. So that the variance of a sample is an unbiased estimate of the population variance, a correction factor (n1) is used in the denominator of the formula for the variance of a sample.

104 4 Measures of Variability

Solution

X M XM (XM)2

3 6 3 9

4 6 2 4

6 6 0 0

8 6 2 4

9 6 3 9

0 0 0 26

s2= X−M 2 n−1 =26

4 =6 5

Equivalent Formulas for the Variance

If we were to take a random sample of introductory-level statistics textbooks and turn to the chapters covering measures of variability, we would be surprised, and maybe confused, by the many different formulas that can be used to compute the variance of a distribution. All of the formulas will give us the same answer (pro- vided we note the distinction between a population and sample of scores). Several formulas for the variance are offered here for two reasons. First, one formula may be easier to use when performing hand calculations with raw data, and a different formula may be helpful in reminding us of the conceptual basis of variance. Sec- ond, by presenting a few formulas for the variance, we will be more easily able to make the transition to other textbooks. Formulas 4.6 and 4.7 are the basic formu- las for the variances of the population and sample. They are calleddefinitional formulassince in reading them, we can be reminded of the concept behind the measurement: the average squared deviation score. In Chapter 3, we learned that a single score minus the mean,X–μ,could be expressed asx, (“littlex”) a devi- ation score. Therefore, Formulas 4.6 and 4.7 can be rewritten as Formulas 4.8 and 4.9, respectively. These formulas are called deviation score formulas.

The deviation score formulas

Population variance Sample variance σ2=Σx2

N (Formula 4.8) s2= Σx2

n−1 (Formula 4.9)

The numerators of both sets of variance formulas direct us to sum all the squared deviation scores. For this reason, the numerator of a variance formula is referred to as thesum of squares(orSS). Hence,SS=Σ(X–μ)2orΣ(X− M)2=Σx2. SubstitutingSSin the numerator of the population and sample for- mulas for the variance defines the SS manner of expression. TheSSis a com- ponent of numerous statistical formulas.

The sum of squares formulas

Population variance Sample variance σ2=SS

N (Formula 4.10) s2= SS

n1 (Formula 4.11)

When working with raw scores, acomputational(orraw score)formulaeases the calculation task. Formulas 4.12 and 4.13 are used to compute the population and sample variances, respectively. (Yes, they look more involving, but they are actually much easier to use when performing hand calculations, especially as the sample size grows.) When using a computational formula, pay close attention to the difference betweenΣX2and (ΣX)2! TheΣX2is found by first squaring each raw score and then summing all squared values. The quantity (ΣX)2requires that we first sum the raw scores and then square the final total. This algebraic distinc- tion is a frequent component in hand calculations of statistical values.

If calculating by hand, it is recommended to simply create two columns, one containing the raw data (labeledX) and the other containing the square of each raw number (labeledX2). Simply sum up both columns. The sum of the raw score column is (ΣX); by squaring this value we will get (ΣX)2. The sum of the squared column isΣX2.

The computational formulas Population variance

σ2=ΣX2 ΣX 2 N

N (Formula 4.12) Sample variance

s2=ΣX2 ΣX 2 n

n1 (Formula 4.13)

Keep in mind that all sample formulas lead to the same answer, with any dis- crepancies accounted for by rounding errors. Of course, all population formulas also yield the same answer. Table 4.1 presents all of the formulas for the variance.

QuestionUse the computational formulas to determine the variance of this distribution when it is a sample of scores and when it is a population of scores.

X X2

2 4

4 16

5 25

7 49

9 81

ΣX =27 X2= 175 106 4 Measures of Variability

Solution

Sample Formula s2=ΣX2− ΣX 2 n

n−1 s2=175− 27 2 5

5−1 s2=175− 729 5

5−1 s2=175−145 80

4 s2=29 20

4 s2=7 30

If the distribution were a sample of scores, the variance would be 7.30. If the scores were a population, we would use the following formula.

Population Formula σ2=ΣX2− ΣX 2 N

N σ2=175− 729 5

5

Table 4.1 Several equivalent expressions of the population and sample variances.

Variance formulas

Population variance Sample variance

Definitional formulas σ2=ΣXμ2

N (Formula 4.6) s2= XM2

n1 (Formula 4.7) Deviation score formulas

σ2=Σx2

N (Formula 4.8) s2= Σx2

n1 (Formula 4.9) Sum of squares formulas

σ2=SS

N (Formula 4.10) s2= SS

n1 (Formula 4.11) Computational formulasa

σ2=ΣX2 ΣX 2 N

N (Formula 4.12) s2=ΣX2 ΣX 2 n

n−1 (Formula 4.13)

aUse these two formulas when working from raw data and calculating by hand.

σ2=175−145 80 5 σ2=29 20

5 σ2=5 84

Viewing the scores as a population, the variance is 5.84.

Sometimes an investigator will learn something important about a phenom- enon when the dispersion of scores is examined. Box 4.1 presents a finding in which the variability of scores reflects an interesting aspect of aging.

Box 4.1 The Substantive Importance of the Variance

Measures of variation are essential indices for describing the degree of disper- sion among scores of a distribution. The variance and its square root, the stand- ard deviation, can both be used as descriptive measures of dispersion; however, the standard deviation is the more useful measure because it is stated in the original units of the measured variable. Yet the variance is still used in many statistical formulas designed to answer research questions.

In experimental research, comparisons are typically made between the means of two conditions. Evaluating two methods for improving communica- tion skills would entail a comparison between the groupmeansof some meas- ure of communication. Discovering ways to help children overcome their shyness would involve comparingmean ratingsof shyness after different treat- ments. In other words, in the experimental context, investigators examine group means to determine the effect of the independent variable on the dependent variable. However, sometimes between-group differences invaria- bilityare important as well. They reveal an important facet of the phenomenon under investigation. An example in which variability has substantive impor- tance comes from the literature on aging. Chronological age is intrinsically a poor predictor of almost any measure of psychological functioning (Woods &

Rusin, 1988). However, as an investigator compares different age groups, they would find that the within-group variability, on a number of cognitive and phys- iological measures, increases with age (Krauss, 1980). In other words, older indi- viduals are more unlike each other than are younger individuals; their distributions are more spread out compared with the distributions of younger people. As a result, researchers investigating questions related to the aged need to pay careful attention to individual differences. A treatment, for instance, that seems ineffective for some older people may prove highly beneficial to other older people.

108 4 Measures of Variability