• Tidak ada hasil yang ditemukan

Summarizing Data Using Descriptive Statistics

Dalam dokumen METHODS IN EDUCATIONAL RESEARCH Y (Halaman 103-116)

Descriptive statistics are used to summarize data from both preestablished and quantitative self-developed instruments using either graphical or mathemat-ical procedures. Almost every study using a quantitative measure will use de-scriptive statistics to depict the patterns in the data. Similarly, in educational practice, we often want ways to describe the overall performance of students, teachers, administrators, or schools on some measure. Descriptive statistics are one tool for summarizing this performance for both research purposes and, in-creasingly, for assessment of educational outcomes at the classroom, school, and district levels.

Frequency Distributions

One of the first questions educators ask when a group has been measured is,

“How did we do?” To answer this question, one needs a way to see how the group as a whole did as well as individual students. One way to depict the overall

per-formance of a group is to display the frequency of each score in a frequency distribution. The word distribution is used to describe the range of scores and their overall frequencies. A frequency distribution displays each score and the fre-quency with which it occurred in either a table (called a frefre-quency table) or in a graph. The scores are ordered from highest to lowest, and the number of per-sons obtaining each score is listed as the frequency in the second column. For ex-ample, in Table 4.2, the number of books that students read (scores) are reported in the first column, and the frequency or number of students who read that many books is reported in the second column. Sometimes possible score values are grouped rather than being listed individually to make the patterns clearer, as in Table 4.3.

Scores may also be displayed in a graph with the scores listed along the x or horizontal axis and the frequencies listed along the y or vertical axis. If the data represent an ordinal scale of measurement or higher, they are usually connected by a line, as in Figure 4.1, and the graph is referred to as a frequency polygon.

Measurement in Educational Research and Assessment 75

TABLE 4.2 FREQUENCY TABLE OF NUMBER OF BOOKS READ BY 30 STUDENTS.

Number of Books Number of Students

20 1

19 0

18 0

17 0

16 1

15 1

14 0

13 1

12 1

11 1

10 1

9 2

8 1

7 2

6 4

5 5

4 4

3 2

2 2

1 1

If the data are categorical, each category is represented by a separate bar, as in Figure 4.2, and the graph is called a histogram.

So what does a frequency distribution tell us about the scores? At a glance, it shows us how the scores distribute, that is, how closely bunched together or how spread out they are, which scores are most frequent and which scores are least fre-quent, and whether there are outlier scores that are very different from the rest.

Distributions of educational measures can take several different forms.

When large groups are randomly sampled and measured (an appropriate pro-cedure for establishing a norm group for a measurement), the distribution often has a shape that is called a normal distribution. If the distribution is

approx-TABLE 4.3 SAMPLE GROUPED FREQUENCY approx-TABLE.

Number of Books Number of Students

17 to 20 1

13 to 16 3

9 to 12 5

5 to 8 11

1 to 4 10

9 to 12 Number of Books Read

Number of Students

13 to 16 17 to 20 5 to 8

1 to 4 0

2 4 6 8 10 12

FIGURE 4.1 SAMPLE FREQUENCY POLYGON.

imately “normal,” it will look bell-shaped and symmetrical, with the highest point on the curve or most frequent scores clustered in the middle of the distribution.

Frequencies of scores will decrease as the distribution moves away from the mid-dle toward either the high or low end of the score values. Most standardized tests present data from a norm group that shows a normal distribution of scores. The symmetrical nature of this distribution allows one to examine a wide range of dif-ferences between scores through the full range of the scores. Skewed distribu-tions are asymmetrical, meaning the scores are distributed differently at the two ends of the distribution. In a negatively skewed distribution, most of the scores are high, but there are a small number of scores that are low. In a posi-tively skewed distribution, most of the scores are low, but there are a few high scores. Figure 4.3 shows how test scores for a class in educational research would look under two different conditions.

The outliers in a skewed distribution pull the “tail” of the distribution out in that direction. So a negatively skewed distribution has a low flat tail at the low end

Measurement in Educational Research and Assessment 77

FIGURE 4.2 SAMPLE HISTOGRAM.

Horror

Type of Book Read

Number of Students

Adventure Animal Friendship Total

Boys Girls

Humor Mystery

0 2 4 6 8 10 12 14 16

of the scores, and a positively skewed distribution has a low flat tail at the high end of the scores.

Some distributions, called bimodal distributions, have two clusters of fre-quent scores seen as the two humps in the distribution, as in Figure 4.4. What does the bimodal distribution show us about the distribution of grades in the distance education class?

What Is a Typical or Average Score? Measures of Central Tendency

Although frequency distributions show the patterns in scores, it is also useful to be able to summarize the performance of a group using a single score for the typical or average performance of a group. These are what researchers call measures of central tendency, and the three most common are the mode, mean, and median.

Mode. The mode is the score in a distribution that occurs most frequently. In a frequency polygon, the mode is the score represented by the highest point on the

FIGURE 4.3 EXAMPLES OF POSITIVELY SKEWED AND NEGATIVELY SKEWED DISTRIBUTIONS.

Graph A: Positively Skewed Distribution

Frequency

26-30 31-35 36-40 41-45 46-50 51-55 Test Scores

Graph B: Negatively Skewed Distribution

56-60 61-65 66-70 71-75 0

1 2 3 4 5 6 7

Note: Graph A is a positively skewed distribution and Graph B is a negatively skewed distribution.

curve. As you just saw in Figure 4.4, some distributions can have more than one mode if two or more scores have similarly high frequencies. (That is why the dis-tribution in Figure 4.4 is called a bimodal disdis-tribution.) The mode is a somewhat imprecise measure of central tendency because it simply summarizes the fre-quency of a single score. If the distribution is asymmetrical, the mode may not be a precise estimate of central tendency.

Mean. The mean is the arithmetical average of a set of scores. To find the mean, simply add up the scores and divide by the number of scores. If numbers are in a frequency table, first multiply each score by its frequency. Table 4.4 presents the number of pages printed at campus computer terminals by one class of students during a single semester and illustrates the calculation of the mean.

A college we know recently considered using the mean number of printed pages to set a new policy for the number of free pages students would be allowed.

They reasoned that because the mean represented the average number of pages printed, the new policy would allow the average student to do all or most of his or her printing for free. The mean is a widely used measure of central tendency in research and in educational practice. You will often find the mean symbolized as an X with a bar above it, as in X.

In a skewed distribution, however, the mean can be misleading as a measure of central tendency. The extreme scores in a positively skewed distribution will result in the mean being higher than the central tendency of the group, and the

Measurement in Educational Research and Assessment 79

FIGURE 4.4 GRADES IN DISTANCE EDUCATION CLASSES.

C Grades

Number of Students

D F

B A

0 10 20 30 40 50

extreme scores in a negatively skewed distribution will “pull the mean down”

toward the low end. Again, using the printing example above, the mean for all of the students at that college was much higher than the average for just the one class because a small number of students printed more than 1,000 pages a semester, and one even printed a million pages! (Of course, the new policy was intended to limit the printing of students like this who were abusing the system.) However, the mean estimated the average number of printed pages per student at a level much higher than was “typical” for most students.

Income is another measure for which the mean is likely to be a poor estimate of central tendency. Income distributions are typically positively skewed. Most of us earn middle to low incomes, but there are a small number of people who earn very high incomes. Imagine what would happen to the mean income of your com-munity if one of the residents was Bill Gates! In a skewed distribution, the best measure of central tendency is the median.

Median. The median is the score that divides a distribution exactly in half when scores are arranged from the highest to the lowest. It is the midpoint of distribu-tion or the score in the distribudistribu-tion at which half of the scores are lower and half are higher. To find the median, you would divide the number of scores in a dis-tribution by 2 and round up if there is a decimal. Then count down from the top of the distribution until you have counted that number of scores. See if you can calculate the median for printing for the class in Table 4.4. If a distribution has an even number of scores, the median is the score halfway between the two scores

TABLE 4.4 CALCULATION OF MEAN FROM A FREQUENCY TABLE.

Number of Number Number of Pages ✕

Pages Printed of Students Number of Students

300 1 300

280 3 840

260 5 1,300

240 6 1,440

220 4 880

200 2 400

180 2 360

160 1 160

Total = 24 Total = 5,680

Note: The mean is calculated as: 5,680 ÷ 24 = 236.67.

in the middle. If a distribution has an odd number of scores, the median is an ac-tual score.

The median is a stable measure of the central tendency of a set of scores.

This means that it is not affected much if there are a few outlier scores in a dis-tribution that are much different from the rest. To return to our example of in-come distributions, if Bill Gates moved into your community, the distribution of income would not change greatly because the addition of one person would sim-ply move the median up one number in the distribution.

Measures of Variability

So now you know several statistics that can be used to describe the average or typ-ical performance of a group. However, two groups might have the same mean even if they differ considerably in the spread of the scores in their distributions.

Consider the two distributions of test scores in Figure 4.5.

Both districts have approximately the same average performance in their stu-dents’ test scores; the mean for District A is 52.08, and the mean for District B is 52.59. However, District A’s scores are much more variable than those for District B. Descriptive statistics can also provide a way to summarize the amount of vari-ability in the scores in a distribution, that is, how widely spread out or scattered the scores are. Two measures of variability frequently used in educational research and assessment are the range and the standard deviation.

Range. The range is the difference between the highest and lowest scores in a dis-tribution. To find the range, you subtract the lowest score from the highest score.

This indicates how many points separate these two scores. Although it is easy to calculate, the range is not a precise or stable measure of variability because it can be affected by a change in just one score. For that reason, most educational re-searchers use the standard deviation as a measure of variability.

Standard Deviation. The standard deviation is the average distance between each of the scores in a distribution and the mean. Now before your eyes glaze over or you begin to wonder how researchers can come up with so many meaningless concepts, let us consider why you would want to calculate such a statistic. Remember that the goal is to summarize the amount of variability in a set of scores. We know that the mean represents the average performance of a group or the center of the group.

So one way to describe variability is to consider how far each score is from that cen-ter score. It is called the standard deviation because it represents the average amount by which the scores deviate from a mean. We will not discuss here how the standard deviation is calculated, but if seeing the numbers helps you understand concepts better, see Appendix C for an example of the calculation of a standard deviation.

Measurement in Educational Research and Assessment 81

FIGURE 4.5 DISTRIBUTIONS OF TEST SCORES FOR TWO SCHOOL DISTRICTS.

Number of Students

25-30 31-35 36-40 41-45 46-50 51-55 56-60 61-65 66-70 71-75 76-80 0

10 20 30 40 50 60 70

Number of Students

25-30 31-35 36-40 41-45 46-50 51-55 56-60 61-65 66-70 71-75 76-80 0

10 20 30 40 50 60 70 80 90 100

District A Mean = 52.09 SD. = 11.07

District B Mean = 52.59 SD = 6.4 –2 SD

(28.69)

–1 SD (40.39)

Mean (52.09)

+1 SD (63.79)

+2 SD (75.49)

–2 SD (39.79)

–1 SD (46.19)

Mean (52.59)

+1 SD (58.99)

+2 SD (65.39)

Most research studies and preestablished instruments will report the standard deviations for the data collected. For any distribution of scores, the standard de-viation will be a specific number, typically including decimals because its calcula-tion rarely results in a simple number. The larger the number, the more variable are the scores in the distribution. In Figure 4.5 the standard deviation for the scores in District A is 11.07 and is 6.4 for District B. If you examine the scores, you will see that, on average, most of them lie within 2 standard deviations of the mean for both districts. That is, if you subtracted each score from the mean, the resulting number would be a value that was less than two times the standard de-viation (22.14 for District A and 12.8 for District B). The standard dede-viation is widely used in part because of its relationship to the normal distribution, visually represented as the normal curve displayed in Figure 4.6, which forms the basis for comparing individual scores with those of a larger group.

Normal Curve. The normal curve is a distribution that has some unique and use-ful mathematical properties. One useuse-ful feature of the normal curve is that a cer-tain percentage of scores always falls between the mean and cercer-tain distances above and below the mean. These distances are described as how many standard deviations above or below the mean a score falls. Approximately 34% of the scores fall between the mean and one standard deviation above it. Similarly, approxi-mately 34% of the scores fall between the mean and one standard deviation below it. So about two thirds (more precisely, 68%) of the scores will be in the area be-tween one standard deviation above and one standard deviation below the mean.

Another 13.5% of the scores will fall in the area that is more than one standard deviation above the mean but less than two standard deviations above the mean.

So between the mean and two standard deviations are 47.5% of the scores (34%

plus 13.5%). Finally, another 2.5% of the scores are higher than two standard de-viations above the mean. Because the curve is symmetrical, 13.4% of the scores will also be in the area that is between one standard deviation below the mean and two standard deviations below the mean. Also, 2.5% of the scores will fall below two standard deviations below the mean. Figure 4.6 illustrates the per-centage of scores in each part of the normal curve. These perper-centages are useful in comparing the performances of persons who have taken an educational mea-surement. We will describe the types of scores used in educational measurement and then return to this scary but useful figure.

Types of Scores Used to Compare Performance on Educational Measures Several types of scores are used to compare performances on educational measures.

Raw Scores. Raw scores are scores that summarize a person’s performance on a measure by showing the number of responses or summing the scores for each

Measurement in Educational Research and Assessment 83

response. Performance on classroom tests is usually reported as raw scores, but this type of score is not used by most preestablished measures. The reason is that one cannot know what a raw score represents without knowing more about the mean and standard deviation of the distribution of scores.

Percentile Ranks. Percentile ranks are scores that indicate the percentage of persons scoring at or below a given score. So a percentile rank of 82% means that 82% of persons scored below that score. Figure 4.6 shows the percentile ranks

FIGURE 4.6 NORMAL CURVE.

Note: *CEEB refers to College Entrance Exam Board test, for example, SAT; **NCE refers to Normal Curve Equivalent.

Source: Adapted from Seashore (1980).

–4σ –3σ –2σ –1σ +1σ

13.59% 34.13% 34.13%

0.13%

Percentage of cases under portions of the normal curve

Standard deviations

5

1 10 20 30 4050 60 70 80 90 95 99

Percentile equivalents

2.14%

13.59%

2.14% 0.13%

+2σ +3σ +4σ

–4.0 –3.0 –2.0 –1.0 0 +1.0

Q1 Md

z scores

Q3

+2.0 +3.0 +4.0

*CEEB scores

1

1 2 3 4 5 6 7 8 9

4% 7% 12% 17% 20% 17% 12% 7% 4%

10 20 30 40 50 60 70 80 90 99

**NCE scores

Stanines

Percentage in stanine

2.3% 15.9% 50.0%

Cumulative percentages rounded

0.1% 84.1% 97.7%

2% 16% 50% 84% 98%

99.9%

200 300 400 500 600 700 800

that correspond to scores based on the number of standard deviations between the score and the mean. For example, reading down from the curve, one can see that a score that is one standard deviation above the mean has a percentile rank of approximately 84. A score that is one half of a standard deviation below the mean has a percentile rank of approximately 31. You can calculate the exact per-centile rank of a score if you know how many standard deviations it is above or below the mean. For a computer simulation that calculates the exact percentile rank based on the normal curve, go to the Web site http://www.stat.berkeley.edu/

users/stark/Java/NormHiLite.htm.

Standard Scores. Standard scores are scores that tell how far a score is from the mean of distribution in terms of standard deviation units. Because standard scores are based on the standard deviation, the normal curve can be used to deter-mine the percentile rank of a person’s score based on his or her standard score.

Therefore, a standard score shows immediately how well a person did in compari-son with the rest of the group taking that measure. Examples of standard scores are z scores and college entrance exams (referred to as CEEB Scores in Figure 4.6) such as the Scholastic Assessment Test (SAT) or Graduate Record Examinations (GRE).

The most basic standard score is the z score, which is calculated in the fol-lowing manner:

Z SCORE = (SCORE − MEAN)/STANDARD DEVIATION

Any score can be converted to a z score if the mean of a distribution and the stan-dard deviation are known. If Karyn got a score of 75 on a test with a mean of 65 and a standard deviation of 5, her z score would be as follows:

Z SCORE = (75 − 65)/5 = 10/5 = +2.0

As you can see, the z score represents simply the number of standard deviations away from the mean that the score is. Therefore, you can go to Figure 4.6 and de-termine Karyn’s percentile rank, knowing that she is two standard deviations above the mean. Try one more example. Peter got a score of 55 on the same test.

What would his z score be? What would his percentile rank be?

Many preestablished measures report student scores as standard scores, allow-ing a direct comparison of how two students did. Because many people do not like using negative numbers, some measures convert their raw scores into slightly dif-ferent standard scores. For example, SAT and GRE scores are simply z scores mul-tiplied by 100 + 500. So they have a mean of 500 and a standard deviation of 100.

Knowing this, you should be able to use Figure 4.6 to determine the percentage of people who score between 400 and 500 on the SAT. If you need a hint, this would be the area between one and two standard deviations below the mean.

Measurement in Educational Research and Assessment 85

Stanines. Stanine scores are a type of standard score that divides a distribu-tion into nine parts, each of which includes about one half of a standard devia-tion. Students are assigned scores ranging from 1 through 9 based on their performance. All students within a given stanine are considered to be equal in performance. Stanine 5 includes scores 0.25 standard deviations above and below the mean (thus z scores of +0.25 to -0.25). Each stanine above or below stanine 5 includes another half of the standard deviation of scores. Again, Figure 4.6 shows the percentage of scores within each stanine.

Percentages. Much of the school accountability data required of schools today is reported as percentages. A percentage is calculated by dividing the number of students or schools receiving a given score by the total number of students or schools and multiplying by 100. Although percentages are useful measurements, some types of data may be distorted when percentages are used. When one examines changes over time, the result may change substantially, based on the size of the group being measured. No Child Left Behind requires that schools report the percentage of chil-dren from each ethnic group who pass the state test. If a small rural district had a small population of one ethnic group, their results could change substantially if only a few families move into their district. For example, one of the authors of this book worked with two districts with different student populations. District A, a rural dis-trict, had only two African American students in grade 4, and District B had over 100 African American students in grade 4. Both of the students in District A failed the state test, resulting in a 100% failure rate. In District B, 50 students failed, re-sulting in a 50% failure rate. So based on percentage scores, it would seem that Dis-trict B is doing better even though they have many more students failing. The overall lesson here is that one should always examine both the total number and the per-centage of persons when comparing measures.

Grade-Equivalent Scores. A grade-equivalent score is reported in years and months. So 3.4 means third grade, fourth month. A grade-equivalent score re-ports the grade placement for which that score would be considered average. This means that the average student at the reported grade level could be expected to get a similar score on this test. (It does not mean that the person is capable of doing work at this grade level.)

Use of Correlation Coefficients in Evaluating Measures

Another type of descriptive statistic used in educational measurement is a correla-tion. As learned in Chapter One, correlations are measures of the relationship be-tween two variables. Because we visited Chapter One ages ago, let us review some

Dalam dokumen METHODS IN EDUCATIONAL RESEARCH Y (Halaman 103-116)