• Tidak ada hasil yang ditemukan

Deciding Which Measure of Variability to Use

Measures of Variability

4.7 Deciding Which Measure of Variability to Use

In Chapter 3 we discussed the relative strengths and weaknesses of various mea- sures of central tendency to help us select the right measure for a given situa- tion. Here we will look at some of the issues that influence the selection of variability measures.

Extreme Scores

The presence of extreme scores in a distribution, depending on the degree of extremity and the percentage of scores considered extreme, can affect most of the measures of variability. Researchers often look at extreme scores with some degree of suspicion, wondering whether they are an accurate measure- ment. If they are not accurate, then any measure of variability that is influenced by extreme scores will convey a misleading impression of the actual dispersion of scores in the distribution. The range is clearly the statistic that is most vul- nerable to extreme scores. Since only the highest and lowest scores of a distri- bution are used to compute the range, one extreme erroneous score can lead to a very inaccurate view of dispersion.

TheIQRandSIQRare not much influenced by asmallnumber of extreme scores, thereby offering a more reasonable statement of variability when extreme scores are found in a distribution. The variance and standard deviation are also affected by extreme scores. Since both measures use squared deviations, an extreme score that is a great distance from the mean will have a dispropor- tionate effect on the variance, especially for small data sets. We must exercise caution when using the variance and standard deviation as measures of varia- bility when there are extreme scores.

Sometimes researchers, who suspect an extreme score is erroneous, consider discarding that score in an effort to generate a more accurate measure of variability.

However, what if there are several extreme scores, as in a skewed distribution?

There is no justification for discarding several scores. In these instances, the var- iability of a distribution is best described by the IQR or SIQR statistic. Furthermore, if the scale of measurement does not allow for the calculation of a mean (e.g. nom- inal or ordinal scale), then deviation scores cannot be calculated. This eliminates the mean deviation, variance, and standard deviation from consideration.

An Arbitrary End Point to the Distribution

Recall the study discussed in Chapter 3 about self-control and pain tolerance (Grimm & Kanfer, 1976). Participants were asked to place their hands in ice water and were told that they could remove them whenever they wanted.

The number of seconds that the participants kept their hands in the water

was the dependent variable. The researchers found that some of the participants did not remove their hands from the ice water and would have continued to keep their hands in it for an unknown length of time. The researchers decided to terminate the task at 300 seconds–an arbitrary end point for the high side of the distribution.

Another example of an arbitrary end point to a distribution occurs when par- ticipants are asked to complete a problem-solving task. What score should be assigned to participants who cannot figure out the answer? At some point, the researcher has to stop them and assign a score, which is supposedly the time it took them to complete the problem-solving task. These situations present a problem when we would like to describe the variability of the distribution. Since, in these two examples, the highest score is arbitrary, any measure of variability that relies on these scores will be under-representative of the actual variability and therefore unreliable as a true measure of dispersion. TheIQRandSIQR, however, are relatively impervious to arbitrary cutoffs at the tail end of a distri- bution, provided there are not too many arbitrary scores.

Common Practice

In common practice, it is rather rare to hear a researcher in the social or behav- ioral sciences report theIQRorSIQRwhen describing a distribution. Research- ers in these academic disciplines simply do not have an“intuitive feel”for these measures. Telling a colleague that theSIQRof our data is 6 will likely produce a blank stare. The range, despite all its vulnerabilities, is much more likely to be identified than theIQRorSIQR. However, in many instances where the range is presented, it is only indirectly stated; the highest and lowest scores in the dis- tribution are identified. The variance, despite its essential role in statistical for- mulas, is also rarely stated among researchers. There is no“intuitive feel”to the variance, being that it is measured in squared units.

If the data is normally distributed, by far the most commonly reported meas- ure of variability is the standard deviation, the square root of the variance.

Therefore, if someone says,“The mean was 50,”we can bet that the first ques- tion asked by another researcher will be,“What was the standard deviation?” Moreover, articles in scientific journals often include a table of means and standard deviations. We will rarely see a table of means and variances or other measures of variability.

Although the standard deviation is the most popular variability measure, we should not ignore all the other measures. Indeed, researchers may err in relying too much on the standard deviation as the measure of dispersion when describing a distribution. It is the responsibility of the researcher who has the most knowl- edge of the characteristics of the data to choose the most appropriate measure of variability. Finally, there is no rule in the social or behavioral sciences restricting us to report only one measure. If we believe it would be helpful, we should feel free to report more than one measure of variability or central tendency.

114 4 Measures of Variability

and Shrinking Variation

Throughout the text a series of several boxes are asking whether the scientific method is broken in light of the nonreproducibility problem currently plaguing the social, behavioral, and medical sciences. In Box 1.1 we looked at the“wall- paper effect”and the difficulty in identifying and controlling all extraneous vari- ables. In Box 2.3 we looked at, among other things, different ways the collection of data may be biased through wording effects and order effects. In this box, let us explore some of the problems that occur in the data gathering process.

Sometimes researchers, even those with the best of intentions and who take precautions to remove their own biases, can influence how others respond merely by their involvement in the study. In general, these are referred to as

“demand characteristics” or “experimenter effects.” Sometimes the uninten- tional involvement of the researcher can take a response that might ordinarily be quite varied in a population and shrink it to almost nothing. Perhaps the most famous historical example of demand characteristics features “Clever Hans,”the horse that could do mathematics (Pfungst, 1911). Hans was owned by Wilhelm Von Osten, a German schoolteacher who had claimed to teach his horse to add, subtract, multiple, divide, and work with fractions. Hans would answer a question by tapping out numbers with his hoof; his accuracy, though not perfect, was remarkable. It was so remarkable that the initial 1904 investi- gation by a team of academics concluded that it was not a trick. Three years later, however, a local psychologist, Oskar Pfungst, concluded that Hans was indeed clever, but was not doing mathematics. Instead, Hans had learned a

“start” cue and a“stop”cue. The“start”cue was the act of being addressed by Von Osten who then looked down at his hooves. As Hans approached the proper answer, Von Osten, believing Hans was about to stop stomping, would make subtle straightening movements of the body and head. This served as the

“stop”cue for the clever horse; his reward of food would be forthcoming. Hans usually landed on the right answer or very close to it. The normal variation of a horse periodically stomping its hooves was now being unintentionally mana- ged by cues from the handler such that the variance of stomps less than or greater than the“right”number shrunk dramatically.

The same problem can take place when researchers are gathering data from other people. Numerous studies suggest that the hopes, expectations, and fears of data gatherers can subtly influence people’s responses (e.g. Nichols & Maner, 2008; Rosenthal & Fode, 1963; Rubin, Paolini, & Crisp, 2010), taking what might ordinarily be a very diffuse set of responses and pulling them tightly together around the researchers desired or expected response.

Demand characteristics and the role of the experimenter in the data gather- ing process are potential problems for many issues that social and behavioral scientists investigate. Even under the best of conditions, responses that may naturally be quite varied can begin to narrow around a certain response and mislead us about the true nature of reality. Perhaps those of us in the social and behavioral sciences need to take the human element into more careful con- sideration when gathering and interpreting data.

Summary

Measures that reflect the amount of variation in the scores of a distribution are called measures of dispersion or measures of variability. The range defines the overall span of scores. It is calculated by subtracting the lowest score of the dis- tribution from the highest score. Unfortunately, the range is extremely sensitive to extreme scores. TheIQRavoids this by measuring the span of scores between the first and third quartile (the 25th and 75th percentile). TheSIQRis the IQR divided by 2.

Another family of dispersion measures takes into account the magnitude of difference between each raw score and the mean– the deviation scores. For example, taking the average of the absolute values of all deviation scores is called the mean deviation. Taking the average of all the squared deviation scores is called the variance. The population variance formula is used when the scores represent a population. If our intent is to infer the variance of a population based on a sample of scores, then the formula for a sample variance is used. It contains a correction factor in the denominator to make it an unbiased estimate of the population variance.

The variance measure of dispersion is particularly important because it is used in many statistical formulas. It is not, however, the best descriptive measure of variability. This is because the variance is a squared value; it is not stated in the original units of the measured variable. The standard deviation is the square root of the variance. As a descriptive measure, the standard deviation improves on the variance by converting the measure back to the original units of the measured variable.

If the scores are normally distributed, the standard deviation can be used to analyze probabilistically a data set. The 68-95-99.7 rule states that the mean plus and minus one standard deviation encompasses roughly 68% of the total num- ber of scores in a distribution; plus and minus two standard deviations include approximately 95% of the total number of scores; and plus and minus three standard deviations comprise virtually all scores (99.7%) in a normally distrib- uted data set.

Transforming the original scores of a distribution by the four basic arithmetic operations will have a predictable effect on the mean and variance. Adding or subtracting a constant to each score will alter the mean by that constant. There will be no effect on the variance. Multiplication or division of every score by a constant will alter the mean accordingly and will alter the variance by the con- stant squared or square-rooted, respectively.

Deciding the most appropriate measure of variability to use depends on var- ious features of the distribution. Factors such as extreme scores, sample size, arbitrary end points of a distribution, common practices, and the importance of a stable estimate of the population variability will influence a researcher’s decision as to which measure of variability is most desirable.

116 4 Measures of Variability

Using Microsoft

®

Excel and SPSS

®

to Find