Using Tables to Organize Data

Scales of Measurement and Data Display

Spotlight 2.1 Rensis Likert

2.3 Using Tables to Organize Data

Box 2.1 Some Notes on the History of Statistics

Although ancient civilizations like the Egyptians and Chinese used tabulation and other simple statistics to keep track of tax collections, government expen- ditures, and the availability of soldiers, the modern use of statistics arguably began with the Englishman John Graunt (1620–1674). Graunt tabulated information on death rates in his hometown of London and noted that the frequency of certain diseases, suicides, and accidents occurred with remarkable regularity from year to year. This realization, by the way, helped to develop the establishment of insurance companies. Graunt also found the occurrence of greater biological male than biological female births. However, due to the greater male mortality rate (occupational accidents and wars), the number of men and women at the marriageable age was about equal. Graunt believed that this arrangement was nature’s way of assuring monogamy (Campbell, 2001).

Most early uses of statistics revolved around simple descriptions of data, but starting around the seventeenth century advances in statistics began to take place, mostly springing from mathematicians’interest in the“laws of chance” as they apply to gambling. The French mathematician Blaise Pascal (1623–1662) was asked the following question by Chevalier de Méré, a profes- sional gambler:“In what proportion should two players of equal skill divide the stakes remaining on the gambling table if they are forced to stop playing the game?”Pascal and Pierre Fermat (1602–1665), another French mathematician, arrived at the same answer, although they offered different proofs. It was their correspondences in the year 1654 that established modern probability theory (Hald, 2003).

The work of Pascal and Fermat was actually anticipated a century earlier by the Italian mathematician and gambler Girolamo Cardano (1501–1576). His vol- ume,The Book on Games of Chance, published posthumously in 1663, contains many tips on how to cheat when gambling and established some of the origins of probability theory. Cardano also practiced astrology. Indeed, by using astro- logical charts he even predicted the year of his death. Upon arriving at that year and finding himself in perfect health, he decided to drink poison to ensure the accuracy of his prediction (Gliozzi, 2008)!

Yet more advances in the field of statistics occurred in the nineteenth and early twentieth centuries. Many of the chapters of this, and every other statistics textbook, are based on the statistical advances of the period between 1850 and 1930. Sir Francis Galton (1822–1911), among other accomplishments, forma- lized a method for making predictions of one variable with knowledge of a sec- ond, related variable (regression analysis) (see also Spotlight 16.1). William Gosset (1876–1937) ushered in the era of modern experimental statistics by

Instead of displaying a frequency count for each score, the viewer learns how many participants obtained scores within a given range.Class intervalsare groups of equal-sized ranges, determined by the researcher and based on how much information loss one is willing to sacrifice in exchange for

Table 2.3 The simple frequency distribution constructed from the unorganized data of Table 2.2.

X f X f

30 1 14 7

29 1 13 6

28 1 12 4

27 1 11 3

26 1 10 3

25 2 9 3

24 2 8 2

23 2 7 1

22 4 6 2

21 3 5 2

20 4 4 0

19 4 3 1

18 6 2 0

17 7 1 1

16 8 0 0

15 8

Table 2.4 A grouped frequency distribution based on the raw data from Table 2.2.

Lower limit

Class interval

Upper

limit Midpoint f

29.5 30–32 32.2 31 1

26.5 27–29 29.5 28 3

23.5 24–26 26.5 25 5

20.5 21–23 23.5 22 9

17.5 18–20 20.5 19 14

14.5 15–17 17.5 16 23

11.5 12–14 14.5 13 17

8.5 9–11 11.5 10 9

5.5 6–8 8.5 7 5

2.5 3–5 5.5 4 3

−0.5 0–2 2.5 1 1

developing analyses that could allow a researcher to make generalizations based on only a small number of observations (thet test) (see also Spotlight 9.1). Sir Ronald Fisher (1890–1962) made extensive contributions to the field of research design and developed statistical analyses that can be used to com- pare the relative influence of several different treatment variables on a depend- ent variable (theFtest) (see also Spotlight 12.1). Contemporary statisticians are continuing to make advances in statistics, each advance allowing researchers to ask increasingly complex questions about the mysteries of human behavior.

2.3 Using Tables to Organize Data 47

simplicity. The class intervals, typically organized in descending order, cover the full range of scores with no gaps and no overlaps. Each particular score belongs to exactly one interval. The table on display in this chapter features class intervals of 3 units.

Class intervals have midpoints, and when depicting continuous variables, they also have upper and lower real limits. An interval of, say, 20–25, would have a midpoint of 23, a lower limit of 19.5, and an upper limit of 25.5. In Table 2.4 the midpoints, lower limits, and upper limits for each interval from the “need for achievement” data are represented. Rarely are the midpoints and real limits presented in published research. They are included here for educational purposes.

Conventional Rules for Establishing Class Intervals

A grouped frequency distribution sacrifices some information by collapsing numbers into a set of intervals, but it is assumed that this information loss is inconsequential and perhaps even beneficial. Being able to examine the pattern of scores over the range of potential scores is often more useful than knowing the frequency of occurrence for each individual score. Table 2.4 uses 11 intervals. As we view the frequency column of the table, we can now easily see that just a few people received scores in the extreme ends of the distribution. Most of the scores are in the middle of the distribution, with the greatest number of scores in the interval 15–17. (This realization is not as easily seen in a simply frequency distribution.) If too few or too many intervals are used, it can be difficult to see how the numbers are concentrated. The use of about 10 class intervals is customary; however, the needs of the researcher vary from situation to situation. The proper number of intervals to use should be determined by what best illustrates a meaningful pattern or distribution of the scores.

Common interval sizes, symbolized byi, arei= 3,i= 5,i= 10, ori= some multiple of 10. There are no fixed rules for constructing a grouped frequency distribution. However, the following additional guidelines will be helpful:

1) Select an interval size that is suitable. As stated earlier, an interval size that leads to about 10 class intervals is usually ideal for interpretation.

2) Some graphs of continuous measures require the use of the interval midpoint. A midpoint that is a whole number makes a graph easier to read.

Try to combine the interval width and the number of intervals in such a way that the midpoint is a whole number. Using anithat is an odd number will accomplish this.

3) The first number of the interval should be a multiple ofi. If the interval width is 10, then the first number of the interval should be a multiple of 10. If the

interval width is 2, then the first number of the interval should be a multiple of 2. This guideline is sometimes violated when the interval width is 5. For instance, instead of using an interval of 25–29, with a midpoint of 27, one may decide to use an interval of 23–27 so that themidpoint is a multiple of 5–in this case, 25.

Cumulative Frequency Distributions

Acumulative frequency distributionhas an additional column that keeps a running tally of all scores up through each given interval. Table 2.5 presents the grouped frequency distribution data found in Table 2.4. The third column of Table 2.5 lists the cumulative frequencies, abbreviatedCum f. The arrows in the table show the additive procedure used to find the cumulative frequency at each interval. It is customary to start accumulating the scores from the bottom of the frequency distribution. For instance, for interval 15–17, the cumulative frequency is 58. That is the sum of frequencies found at that interval plus all preceding intervals (1 + 3 + 5 + 9 + 17 + 23 = 58). Note that the total number of scores in the distribution is the top number of the Cum fcolumn.

Table 2.5 A cumulative frequency distribution based on the grouped frequency distribution in Table 2.4.

Class

interval f Cum f

30–32 1 90

27–29 3 89

24–26 5 86

21–23 9 81

18–20 14 72

15–17 23 58

12–14 17 35

9–11 9 18

6–8 5 9

3–5 3 4

0–2 1 1

2.3 Using Tables to Organize Data 49

Dalam dokumen PDF Statistical Applications for the Behavioral and Social Sciences (Halaman 59-64)