carefully chosen samples of behavior to which a numerical or category system is applied according to some preestablished standards. Psychological testing is largely coextensive with the field of psychometrics,or psychological measurement, and is one of the primary tools for the science and practice of psychology.
The use of numbers in testing requires us to delve into statistics. For many stu- dents of psychology the use of statistics and quantitative data in general poses a problem that may seem insurmountable: namely, that dealing with numbers tends to cause some anxiety. This anxiety is connected with the distress that classes in mathematics and statistics often induce for reasons that may be related as much to emotional or attitudinal factors as to those subjects themselves or to the way they have been taught traditionally. This chapter presents the statistical concepts needed to understand the basic principles of psychological testing. Those who have mastered basic statistics may be able to skip all or most of the chapter. As for the rest, any motivated reader of this book can achieve a serviceable grasp of the concepts described here. It is important, however, to realize that these con- cepts follow a logical progression; in order to proceed to each new topic it is es- sential to master the preceding ones. Additional help in understanding basic sta- tistical methods is readily available in many excellent textbooks, such as the ones listed in Rapid Reference 2.1.
VARIABLES AND CONSTANTS
One of the most basic distinctions we can make in any science is that between variables and constants. As the terms themselves imply, a variableis anything that varies whereas a constantis anything that does not. Our world has many variables and few constants. One example of a constant is π( pi), the ratio of the circum- ference of a circle to its diameter, a number that is usually rounded to 3.1416.
Variables, on the other hand, are everywhere and they can be classified in a mul- titude of ways. For example, some variables are visible (e.g., sex, color of eyes) and others invisible (e.g., personality, intelligence); some are defined so as to pertain to very small sets and others to very large sets (e.g., the number of children in a family or the average income of individuals in a country); and some are continu- ous, others discrete.
This last distinction is important for our purposes and bears some explaining.
Technically, discretevariables are those with a finite range of values—or a poten- tially infinite, but countable, range of values. Dichotomousvariables, for instance, are discrete variables that can assume only two values, such as sex or the outcome of coin tosses. Polytomousvariables are discrete variables that can assume more than two values, such as marital status, race, and so on. Other discrete variables
can assume a wider range of values but can still be counted as separate units; ex- amples of these are family size, vehicular traffic counts, and baseball scores. Al- though in practice it is possible to make errors in counting, in principle, discrete variables can be tallied precisely and without error.
Continuousvariables such as time, distance, and temperature, on the other hand, have infinite ranges and really cannot be counted. They are measured with scales that could be theoretically subdivided into infinity and have no breaks in between their points, such as the scales in analog clocks, yardsticks, and glass thermometers.
Since our measuring instruments (even atomic clocks!) can never be calibrated with enough precision to measure continuous variables exactly, the measurements we take of such variables are more or less accurate approximations.
Before we start dealing with numbers, another word of caution is in order. In psychological testing, we are almost always interested in variables that are con- tinuous (e.g., degrees of integrity, extraversion, or anxiety), yet we measure with tools, such as tests or inventories, that are not nearly as precise as those in the physical and biological sciences. Even in those sciences the discrete measurement 36 ESSENTIALS OF PSYCHOLOGICAL TESTING
Advice on Statistics
Basic Premises
1. To understand psychological tests, one needs to deal with numbers and statis- tics.
2. Understanding statistics is possible for anyone who reads this book.
3. The best way to increase one’s grasp of statistical concepts is to apply them.
Recommended Sources of Help with Statistics Books
• Howell, D. C. (2002).Statistical methods for psychology(5th ed.). Pacific Grove, CA: Duxbury.
• Kirk , R. E. (1999).Statistics: An introduction(4th ed.). Fort Worth, TX: Harcourt Brace.
• Urdan, T. C. (2001).Statistics in plain English.Mahwah, NJ: Erlbaum.
• Vogt, W. P. (1998).Dictionary of statistics and methodology: A nontechnical guide for the social sciences(2nd ed.).Thousand Oaks, CA: Sage.
Video
• Blatt, J. (Producer/Writer/Director). (1989).Against all odds: Inside statistics [VHS videocassette]. (Available from The Annenberg/CPB Project, 901 E St., NW, Washington, DC 20004-2006)
Rapid Reference 2.1
of continuous variables poses some limitations on the accuracy measure- ments. It therefore stands to reason that in the behavioral sciences we must be particularly aware of poten- tial sources of error and look for per- tinent estimates of error whenever we are presented with the results of any measurement process. For ex- ample, if polls taken from samples of potential voters are used to estimate the outcome of an election, the esti- mated margins of error have to be displayed alongside the results of the polls.
In summary, when we look at the results of any measurement process, we need to hold clearly in mind the fact that they are inexact.With regard to psychological testing in particular,
whenever scores on a test are reported, the fact that they are estimates should be made clear; furthermore, the limits within which the scores might range as well as the confidence levels for those limits need to be given, along with interpretive in- formation (see Chapter 4).
THE MEANING OF NUMBERS
Because numbers can be used in a multitude of ways, S. S. Stevens (1946) devised a system for classifying different levels of measurement on the basis of the rela- tionships between numbers and the objects or events to which the numbers are applied. These levels of measurement or scales—outlined in Table 2.1—specify some of the major differences in the ways numbers may be used as well as the types of statistical operations that are logically feasible depending on how num- bers are used.
Nominal Scales
At the simplest level of his classification, Stevens placed what he called nominal scales. The word nominalis derived from the Latin root nomen,meaning name.As
DON’T FORGET
• Although numbers may seem pre- cise, all measurements are prone to error.
• When we are measuring discrete variables, errors arise only from in- accurate counting. Good practice requires the prevention, detection, and correction of inaccurate count- ing.
• When we are measuring continu- ous variables, on the other hand, measurement error is inevitable, as a consequence of the limitations of measurement tools.
• As measurement tools, psychologi- cal tests are subject to many limita- tions. Hence, margins of error must always be estimated and communi- cated along with test results.
Table 2.1Levels of Measurement Scale TypeDefining CharacteristicProperties of NumbersExamples NominalNumbers are used instead of words.Identity or equalitySS#s; football players’ jersey numbers; numerical codes for nonquantitative variables, such as sex or psychiatric di- agnoses OrdinalNumbers are used to order a Identity + rank orderRanking of athletes or teams; percentile hierarchical series.scores IntervalEqual intervals between units but no Identity + rank order + equality of unitsFahrenheit and Celsius temperature true zero.scales; calendar time RatioZero means “none of” whatever is Identity + rank order + equality of units Measures of length; periods of time measured; all arithmetical operations + additivity possible andmeaningful.
this implies, in such scales, numbers are used solely as labels to identify an indi- vidual or a class. The nominal use of numbers to label individuals is exemplified by the Social Security numbers (SS#s) that identify most people who live in the United States; these numbers are useful because each is assigned to only one per- son and can therefore serve to identify persons more specifically than their first and last names, which can be shared by many people. Numbers can also be used to label categorical data,which are data related to variables such as gender, political affiliation, color, and so forth—that is, data that derive from assigning people, objects, or events to particular categories or classes. When entering demographic data into a computer for analysis, for instance, investigators typically create a nominal scale that uses numbers to indicate the levels of a categorical variable.
For example, the number 1 (one) may be assigned to all females and 2 (two) to all males. The only requirement for this use of numbers is that all the members of a set designated by a given number should be equal with regard to the category as- signed to that number. Naturally, while the numbers used in nominal scales can certainly be added, subtracted, multiplied, or divided, the results of such opera- tions are not meaningful. When we use numbers to identify categories, such as pass-fail or psychiatric diagnoses, the only property of such numbers is identity;
this means that all members of a category must be assigned the same number and that no two categories may share the same number. The only permissible arith- metic operation is counting the frequencies within each category. One can then, of course, manipulate those frequencies further by calculating proportions and doing further analyses based on them.
Ordinal Scales
The numbers used in ordinalscales convey one more bit of meaning than those in nominal scales, albeit a significant one. In these scales, in addition to identity, there is the property of rank order,which means that the elements in a set can be lined up in a series—from lowest to highest or vice versa—arranged on the ba- sis of a single variable, such as birth order or level of academic performance within a given graduating class. Although rank order numbers convey a precise meaning in terms of position, they carry no information with regard to the dis- tance between positions. Thus, the students in a class can be ranked in terms of their performance, but this ranking will not reflect the amount of difference be- tween them, which could be great or small. Similarly, in any hierarchical organi- zation, say, the U.S. Navy, ranks (e.g., ensign, lieutenant, commander, captain, admiral) denote different positions, from lowest to highest, but the differences between them in terms of accomplishments or prestige are not the same. If those
ranks were assigned numbers, such as 1, 3, 7, 14, and 35, the order of precedence would be maintained, but no further meaning would be added.
In psychological testing, the use of ordinal numbers to convey test results is pervasive. Rank ordered test scores are reported as percentile rank (PR) scores—not to be confused with the familiar percentage scores widely used in school grading.
Percentile scores are simply ordinal numbers set on a scale of 100, so that the rank indicates the percentage of individuals in a group who fall at or below a given level of performance. For example, the percentile rank score of 70 indicates a level of performance that equals or exceeds that of 70% of the people in the group in question. Percentile rank scores, often referred to simply as percentiles,are the main vehicle whereby test users convey normative information derived from tests, and thus they will be discussed again, at greater length, in the next chapter.
Numerical data from ordinal scales can be manipulated statistically in the same way as nominal data. In addition, there are a few statistical techniques, such as Spearman’s rho (rS) correlation coefficient for rank differences, that are specifi- cally appropriate for use with ordinal data.
Interval Scales
In interval scales, also known as equal-unitscales, numbers acquire yet one more important property. In these scales, the difference between any two consecutive numbers reflects an equal empirical or demonstrable difference between the ob- jects or events that the numbers represent. An example of this is the use of days to mark the passage of calendar time. One day consists of 24 hours, each hour of 60 minutes, and each minute of 60 seconds; if two dates are 12 days apart, they are exactly three times as far apart as two dates that are only 4 days apart. Note, how- ever, that calendar time in months is not an equal-unit scale because some months are longer than others. Furthermore, calendar time also typifies a characteristic of interval scales that limits the meaning of the numbers used in them, namely, that there is no true zero point. In the case of calendar time, there is no agreed upon starting point for the beginning of time. Different cultures have devised arbitrary starting points, such as the year Christ was presumed to have been born, to mark the passage of years. For instance, the much anticipated arrival of the new mille- nium at the end of the year 2000 of the Christian or Common Era came in the year 5761 of the Jewish calendar and in the year 4699 of the Chinese calendar, both of which start many years before the beginning of the Common Era.
In interval scales, the distances between numbers are meaningful. Thus, we can apply most arithmetical operations to those numbers and get results that make sense. However, because of the arbitrariness of the zero points, the num- bers in an interval scale cannot be interpreted in terms of ratios.
40 ESSENTIALS OF PSYCHOLOGICAL TESTING
Ratio Scales
Withinratioscales, numbers achieve the property of additivity,which means they can be added—as well as subtracted, multiplied, and divided—and the result ex- pressed as a ratio, all with meaningful results. Ratio scales have a true or absolute zero point that stands for “none of ” whatever is being measured. In the physical sciences, the use of this type of measurement scale is common; times, distances, weights, and volumes can be expressed as ratios in a meaningful and logically con- sistent way. For instance, an object that weighs 16 pounds is twice as heavy as one that weighs 8 pounds (16/8 = 2), just as an 80-pound object is twice as heavy as a 40-pound object (80/40 = 2). In addition, the zero point in the scale of weights indicates absolute weightlessness. In psychology, ratio scales are used primarily when we measure in terms of frequency counts or of time intervals, both of which allow for the possibility of true zeros.
Categorical or discrete data can be measured—or accounted for—only with nominal scales, or with ordinal scales if the data fall in a sequence of some kind.
Continuous, or metric, data can be measured with interval scales, or ratio scales if there is a true zero point. In addition, continuous data can be converted into classes or categories and handled with nominal or ordinal scales. For instance, we could separate people into just three categories—tall, medium, and short—by establishing a couple of arbitrary cut-
off points in the continuous variable of height.
When we move from nominal to ratio scales, we go from numbers that carry less information to numbers that carry more. As a consequence of this, going from one level of mea- surement to another requires us to be aware of whether the information that the numbers entail is preserved through whatever transformations or manipulations we apply to them.
Why Is the Meaning of Numbers Relevant to Psychological Testing?
Though it is not universally favored, Stevens’s system for classifying scales
DON’T FORGET
• In measurement there has to be a demonstrable link between the numbers applied to objects, events, or people and the reality the num- bers represent.
• When the rules used to create this link are not understood, the results of the measurement process are easily misinterpreted.
• As we shift from one level of mea- surement to another, we must be aware of whether the information the numbers entail is preserved in the transformations or manipula- tions we apply.
• Scores are numbers with specific meanings. Unless the limitations in the meaning of scores are under- stood, inaccurate inferences are likely to be made from the scores.
of measurement helps to keep the relativityin the meaning of numbers in proper perspective. The results of most psychological tests are expressed in scores, which are numbers that have specific meanings. Unless the limitations in the meaning of scores are understood, inaccurate inferences are likely to be made on the basis of those scores. Unfortunately, this is too often the case, as can be seen in the following examples.
Example 1: Specific limitations of ordinal scales.As mentioned earlier, many scores are reported in the form of percentile ranks, which are ordinal-level numbers that do not imply equality of units. If two scores are separated by 5 percentile rank units—e.g., the 45th and 50th percentiles—the difference between them and what the difference represents in terms of what is being measured cannot be equated with the difference separating any other scores that are 5 percentile units apart—for example, the 90th and 95th percentiles. In a distribution of scores that approximates the normal curve, discussed later in this chapter and portrayed in Figure 2.2, the majority of test scores cluster around the center of the distribu- tion. This means that in such distributions differences between rank scores are al- ways greater at the extremes or tails of the distribution than they are in the middle.
Example 2: The problem of ratio IQs.The original intelligence quotients devised for use with the Stanford-Binet Intelligence Scale (S-B) were ratio IQs.That is to say, they were real quotients,derived by dividing the mental age ( MA) score a child had obtained on the S-B test by the child’s chronological age (CA) and multiply- ing the result by 100 to eliminate the decimals. The idea was that average children would have similar mental and chronological ages and IQs of approximately 100.
Children functioning below the average would have lower mental than chrono- logical ages and IQs below 100, while those functioning above the average would have higher mental than chronological ages and IQs above 100. This notion worked fairly well for children in the early and middle school ages during which there tends to be a somewhat steady pace of intellectual growth from year to year.
However, the MA/CA ratio simply did not work for adolescents and adults be- cause their intellectual development is far less uniform—and changes are often imperceptible—from year to year. The fact that the maximum chronological age used in calculating the ratio IQ of the original S-B was 16 years, regardless of the actual age of the person tested, created additional problems of interpretation.
Furthermore, the mental age and chronological age scales are not at the same level of measurement. Mental age, as assessed through the first intelligence tests, was basically an ordinal-level measurement, whereas chronological age can be measured on a ratio scale. For these reasons, dividing one number by the other to obtain a quotient simply did not lead to logically consistent and meaningful re- sults. Rapid Reference 2.2 shows numerical examples highlighting some of the problems that have caused ratio IQs to be abandoned.
42 ESSENTIALS OF PSYCHOLOGICAL TESTING