Absolute and relative scores: raw scores and normative scores

The score a person gets on a test is known as the raw score. Raw scores are absolute quantities. For ability tests they are usually the number of items someone gets right, but they could also be the time taken to complete some activity or some other more complex measure of performance.

We have to be very careful when we interpret raw scores on a test.

The raw score is important for what it tells us about a person’s ability.

If someone gets a raw score of zero on a test, it does not mean that they have no ability – only that they have failed to reach the lowest point scored on that test. Imagine you lived in a subzero climate, and only had a thermometer which read from zero degrees upwards.

According to your thermometer it would always be zero degrees. This does not mean that there is never any temperature at all, only that the temperature is too low to be measured by this scale. The problem lies in having the wrong thermometer, so you are unable to measure below zero degrees.

In addition to raw scores, we tend to make great use of normative scores or normed scores. These are derived from raw scores and provide a way of describing how well a person has done relative to other people. This is an impor-tant part of the process of the test development and standardization.

In relation to height and weight, it is like saying: John Smith is 5ft 8in (raw score) which is very tall for his age (normative statement). Mary James weighs 8 stone (raw score) which is underweight for her age (normative statement).

In both examples, the raw score is an absolute value and provides one sort of information. The normative statement, on the other hand, provides addi-tional information which helps us interpret the implications of the absolute score.

For many everyday measures, we ‘know’ what they mean because we are famil-iar with the scale. We know that 6ft 3in is tall for people and that 4ft 6in is short;

we know that a four-mile walk will take about an hour; that 80 degrees F is quite hot. We have implicit normative information about these scales and can

‘think’ in them. When the UK currency changed from the complex £sd system (12 pence to one shilling and 20 shillings to one pound sterling) to the deci-mal £p system (100 ‘new’ pence to one pound), people had to go through an extensive period of translating the new money into ‘old’ money in order to know how much it was worth. Similarly, adjusting to the change from Fahrenheit to Centigrade for weather reports has been difficult for many people in the UK.

Norm-referenced, self-referenced, criterion-referenced and domain-referenced measures

Most psychological measures are carried out using raw score scales which have no implicit normative meaning. We have nothing we can directly refer them to in order to make sense of them. Therefore we have to relate them to something else.

There are four main ways we get round this problem of assigning

‘meaning’ to a score:

norm-referencing self-referencing criterion-referencing domain-referencing

For test administration we will only consider the first two of these.

Norm-referenced scores – comparing people with other people

A norm-referenced score defines where a person’s raw score lies in relation to the scores obtained by other people (that is, the norm group). The reason for using norm-referenced scores is to see whether the person is below average, average, or above average. Such scores are relative measures as they depend on who the ‘other people’ are. A given ability score may be low when compared against a university graduate norm group and high when com-pared against a sample of people drawn from the general population. Typic-ally norm-referenced scores are expressed either as percentiles or on one of a number of standard score scales. We will look in some detail at these two types of score in a later section of this Module.

SCALES AND MEASUREMENT 45

Self-referenced or ipsative tests – comparing people with themselves Self-referenced tests are those where people are asked to make choices between items from different scales. For example, you may be asked to say which of two statements is ‘Most like you’ or which of four statements are ‘Most like you’ and ‘Least like you’. There are some examples in the box below. This type of inventory is quite complicated to score, and in many cases you will find that hand-scoring keys are not published and that responses are scored electronically – see Module 2 for further details about different methods of scoring.

Self-referenced or ipsative tests

Self-referenced tests are sometimes referred to as ipsative tests. In an ipsative test, the scores on each scale are dependent on each other to some degree. In a fully ipsative test, the degree to which scores on one scale are dependent on scores on the others depends simply on the number of scales. For example, if you have only two ipsative scales, then whatever you score on one scale fixes what the other scale score must be. What you score on one reduces the freedom for what you might score on the other scales to vary. As the number of scales increases, so this reduction in ‘freedom to vary’ decreases.

Number of items attempted on a test and number of items not

attempted are two ipsative scales. As items must either be attempted or not, then these two scales are totally dependent on each other – as the score on one goes up, so the other must go down.

Ipsative tests will always have at least two – usually more – scales.

Ipsative tests are quite common amongst personality and interest measures where typical rather than maximum performance is being looked at. Ipsative tests are a useful complement to non-ipsative ones:

each tells us different things about a person. However, considerable caution needs to be exercised when interpreting the results from ipsative tests. As the interpretation of result is the responsibility of the test user, we do not need to go into this in any detail here.

Self-referenced scores use the person taking the test as their own

‘norm’. To do this, the test scores have to be derived in a special way.

Let us first clarify how norm-referenced measures can be used to talk about both differences between people and differences within people.

Compare the following statements about two people’s scores on two personality scales (‘need for achievement’ and ‘need for other people’):

1. John’s need for achievement and his need for other people are both below average.

2. John’s need for achievement is stronger than his need to be with other people.

3. Huda’s need for achievement and her need for other people are both above average.

4. Huda’s need for achievement is stronger than her need to be with other people.

The first and third are norm-referenced statements. They tell us that John’s scores on the two scales are lower than the average scores on those scales for other people in the population, while Huda’s are higher.

The second and fourth are making comparisons between scales for the two people. They say that both John and Huda are more achievement-oriented than they are people-oriented.

Understanding and

Using Test Norms

Dalam dokumen The BPS Occupational Test (Halaman 50-54)