• Tidak ada hasil yang ditemukan

Measures of Central Tendency

3.4 The Mean

Themean, colloquially referred to as the“average,”is the most frequently used measure of central tendency. It is also commonly used in formulas designed to test experimental hypotheses. As a descriptive measure, the mean has some advantages and disadvantages, which will be discussed later. Formula 3.1a shows how to find the mean for a population.

Population mean μ=ΣX

N (Formula 3.1a)

where

μ= (pronounced“mew”) the symbol for the mean of a population X= a score in the distribution

N= the total number of scores in the population (or population size)

Σ= (pronounced“sigma”) a notation that directs one to sum up a set of scores.

Thus,ΣX=X1+X2+X3 Xn

The formula for thesamplemean is identical to the population formula, with the exception of two different symbols. These different symbols clarify if the data set is considered a sample or a population. As different formulas are presented in the text, we will see that Greek letters are used to represent features of a population, while Romanized letters are used to represent features of samples.

Sample mean M=ΣX

n (Formula 3.1b)

where

M= the symbol for the mean of a sample

n =the total number of scores in the sample (or sample size)

QuestionWhat is the mean of this population of scores?

5, 8, 10, 11, 12

Solutionμ=46 5 = 9 20

Notice that the answer of 9.2 would be the same whether the set of scores is considered a population or a sample. Although the designation is theoretically important, it does not impact the calculation of the statistic.

3.4 The Mean 71

The common practice in many statistics books is to use X(pronounced“X bar”) to represent the sample mean. This is the more traditional symbol. How- ever, most recent published manuscripts in the social and behavioral sciences report sample means using an M. Since students are much more likely to encounter this symbol in their readings and to use this symbol in their profes- sional writing,Mwill be the symbol used in this textbook. If we see theXsymbol elsewhere, keep in mind that it also stands for the sample mean.

There are three measures of central tendency discussed in this chapter: the mean, median, and mode. Each measure is designed to communicate where scores tend to center or group in the distribution. However, each measure approaches the concept of “centeredness” differently. In what way does the mean reflect the center of a distribution? Or stated in other words, what does the meanmean?

Each raw score in a distribution can be thought of as being“off”of some mid- dle point or deviating from some middle point by a certain amount, even if that amount is zero. The mean is the value where the sum of those raw score devia- tions across a data set equal zero. To clarify, let us tackle this a different way.

Adeviation score (sometimes referred to as anerror score) is the distance a raw score is from the mean (X–M), and let us symbolize it asx(pronounced

“little x”). Therefore,x=X–M. So, if the mean of a distribution is 10, a raw score of 12 has a deviation (or error) score of 2.

In Table 3.1, the deviation score for each raw score is listed in the fourth column. Note that a raw score has a negative deviation score when it falls below the mean and a positive deviation score when it falls above the mean.The sum of all the deviation scores equals 0; this is how the mean defines the middle or the center of a distribution. Stated mathematically,Σ(X–M) =Σx= 0. In Table 3.1, both distributions have identical scores except for Participant 5. A score of 30, instead of 10, is obtained by Participant 5 in DistributionB. As a consequence, theMof DistributionB(10) is greater than theMof DistributionA(6). How- ever, the deviation scores still sum to 0. In a manner of speaking, the mean has

Table 3.1 Deviation scores always sum to zero.

DistributionA DistributionB

Participant Score M XM(x) Participant Score M XM(x)

P1 2 6 4 P1 2 10 8

P2 4 6 2 P2 4 10 6

P3 6 6 0 P3 6 10 4

P4 8 6 +2 P4 8 10 2

P5 10 6 +4 P5 30 10 +20

Σx= 0 Σx= 0

adjusted itself so that theΣxis still = 0. It is in precisely this sense that the mean is the center of a distribution. For every distribution, no matter what its shape or number of raw scores,the sum of the deviation scores off of the mean always equals 0.

The Weighted Mean

Imagine the mean SAT Writing and Language scores from three high schools in one school district are 425, 470, and 410. If we wanted to find the mean SAT score for the district, would we be justified in taking the mean of the three high school means? No, not unless each school had the same number of students. For instance, imagine the school with the highest SAT average has twice the number of students compared with the other schools. Failing to take that into account would generate a combined mean that would be too low. We need a system of taking each mean into account based on the number of scores that were used to create it. Formula 3.2 accomplishes this task by computing theweighted mean (orgrand mean).

Weighted mean

M=n1 M1 +n2 M2 + nnMn

n1+n2 + nn

(Formula 3.2)

where

n1.n2= the number of scores in the first group, the second group, and so forth nn= the number of scores in the last group

M1,M2= the mean of the first group, the second group, and so forth Mn= the mean of the last group

QuestionWhat would be the weighted mean, assuming the following values?

School 1 School 2 School 3

n1= 220 n2= 178 n3= 192 M1= 425 M2= 470 M3= 410

Solution

M=220 425 + 178 470 + 192 410 220 + 178 + 192

=255 880 590 M= 433 69

3.4 The Mean 73

QuestionThe mean blood pressure for three age groups has been recorded.

What is the overall mean blood pressure?

Age

2039 4059 60+

Systolic 118 128 145

Diastolic 70 78 82

n 13 12 16

SolutionMsystolic= 131 andMdiastolic= 77

The Mean of a Frequency Distribution

Chapter 2 shows how a distribution of scores can be displayed in a table, which allows us to ascertain the frequency with which each score occurs. It is an easy matter to calculate the mean of a distribution displayed in such a fashion, whether it is considered a population or a sample. Simply use the following formula.

Mean of a frequency distribution μorM=ΣXf

Σf (Formula 3.3)

where

f =frequency with which a score appears

Table 3.2 includes a column of raw scores, a frequency column, and a column of cross products,Xf. The sums at the bottom of each column are used to find the mean.

The mean is an attractive measure of centrality not only because it incorpo- rates every value in a data set but also because it includes each score’s interval distance away from the center. As we will see, the other measures of centrality cannot do this. However, this ability is a double-edged sword. In Table 3.1, we saw that replacing Participant 5’s raw score of 10 with the quite discrepant value of 30 shifted the mean tremendously–from 6 all the way to 10. This highlights a problem with using the mean; it is very sensitive to extreme scores. This prob- lem intensifies as data sets get smaller. An extreme score has a greater influence on the resulting mean as the size of the sample or population shrinks.

To illustrate this, consider Congressman Ezra Windblows. The congressman is elected on the promise to bring prosperity to the district. During the next elec- tion Windblows would like to convince the constituents that the promise has been kept. The definition of prosperity Windblows uses is the mean income

of families living in the exceptionally small district. When the congressman was first elected, the mean income in the district was $50 000. Two years later, one couple moved into the district with a yearly income of $250 000. Everyone else’s income remained the same. Look what happens to the average family income when the mean is used as the measure of central tendency.

Family income (beginning of Congressman Windblowsterm)

Family income (end of Congressman Windblowsterm)

$44 000 $44 000

$48 000 $48 000

$50 000 $50 000

$52 000 $52 000

$56 000 $56 000

μ= $50 000 $250 000

μ= $83 333

Congressman Windblows could honestly report that the average income per family had dramatically increased during this short term in office. Since extreme scores in small samples can result in a mean that does not appear to represent the middle of a distribution, it is necessary to have an index of central tendency that is not particularly sensitive to extreme scores. Now imagine what would happen if that same family moved into a district with about 50 000 families.

Would the mean change much?

Table 3.2 Calculating the mean from a frequency distribution.

X f Xf

7 1 7

6 3 18

5 2 10

4 5 20

3 4 12

2 1 2

1 1 1

n=Σf= 17 ΣXf= 70 M= Xf= 70

f= 17 = 4 12

3.4 The Mean 75

Unfortunately, many distributions possess more than one extreme score.

Skewed distributions, in fact, can feature a moderate percentage of scores trail- ing well off to one side. If we are interested in accurately communicating where scores of a distribution are bunched, and the existence of extreme scores would lead to a misleading impression, then a different measure of centrality is needed.