• Tidak ada hasil yang ditemukan

Psychophysical Methods

Dalam dokumen HANDBOOK OF HUMAN FACTORS AND ERGONOMICS (Halaman 77-82)

SENSATION AND PERCEPTION

2.2 Psychophysical Methods

The more direct concern in human factors and ergo- nomics is with behavioral measures, because our interest is primarily with what people can and cannot perceive and with evaluating specific perceptual issues in applied settings. Because many of the methods used for obtaining behavioral measures can be applied to evaluating aspects of displays and other human factors concerns, we cover them in some detail. The reader is referred to textbooks on psychophysical methods by Gescheider (1997) and Kingdom and Prins (2010) and to chapters by Schiffman (2003) and Rose (2006) for more thorough coverage.

2.2.1 Psychophysical Measures of Sensitivity Classical Threshold Methods The goal of one class of psychophysical methods is to obtain some estimate of sensitivity to detecting either the presence of some stimulation or differences between stimuli. The classical methods were based on the concept of a threshold, with anabsolute thresholdrepresenting the minimum amount

Table 1 Determination of Sensory Threshold by Method of Limits Using Alternating Ascending (A) and Descending (D) Series

Stimulus Intensity

(Arbitrary Units) A D A D A D A D A D

15 Y

14 Y Y Y

13 Y Y Y

12 Y Y Y Y

11 Y Y Y Y Y Y

10 Y Y Y N Y Y Y Y

9 N Y Y N N N N Y Y N

8 N N N N N Y N

7 N N N N N N

6 N N N N N

5 N N N N

4 N N N

3 N N

2 N N

1 N

Transition 9.5 8.5 8.5 9.5 10.5 9.5 9.5 7.5 8.5 9.5

pointsa

aMean threshold value=9.1.

of stimulation necessary for an observer to tell that a stimulus was presented on a trial, and a difference thresholdrepresenting the minimal amount of difference in stimulation along some dimension required to tell that a comparison stimulus differs from a standard stimulus.

Fechner (1860) developed several techniques for finding absolute thresholds, with the methods of limits and constant stimuli being among the most widely used.

To find a threshold using themethod of limits, equally spaced stimulus values along the dimension of interest (e.g., magnitude of stimulation) that bracket the threshold are selected (see Table 1). In alternating series, the stimuli are presented in ascending or descending order, beginning each time from a different, randomly chosen starting value below or above the threshold. For the ascending order, the first response typically would be,

“No, I do not detect the stimulus.” The procedure is repeated, incrementing the stimulus value each time, until the observer’s response changes to “yes,” and the average of that stimulus value and the last one to which a “no” response was given is taken as the threshold for that series. A descending series is conducted in the same manner, but from a stimulus above threshold, until the response changes from yes to no. The thresholds for the individual series are then averaged to produce the final threshold estimate. A particularly efficient variation of the method of limits is thestaircasemethod (Cornsweet, 1962). For this method, rather than having distinct ascending and descending series started from randomly selected values below and above threshold, only a single continuous series is conducted in which the direction of the stimulus sequence—ascending or descending—is reversed when the observer’s response changes. The threshold is then taken to be the average of the stimulus

values at which these transitions occur. The staircase method has the virtue of bracketing the threshold closely, thus minimizing the number of stimulus presentations that is needed to obtain a certain number of response transitions on which to base the threshold estimate.

The method of constant stimuli differs from the method of limits primarily in that the different stimulus values are presented randomly, with each stimulus value presented many different times. The basic data in this case are the percentage of yes responses for each stimulus value. These typically plot as an S-shaped psychophysical function (see Figure 2). The threshold is taken to be the estimated stimulus value for which the percentage of yes responses would have been 50%.

Both the methods of limits and constant stimuli can be extended to difference thresholds in a straightforward manner (see Gescheider, 1997). The most common extension is to use stimulus values for the comparison stimulus that range from being distinctly less than that of the standard stimulus to being distinctly greater. For the method of limits, ascending and descending series are conducted in which the observer responds “less,”

then “equal,” and then “greater” as the magnitude of the comparison increases, or vice versa as it decreases, The average stimulus value for which the responses shift from less to equal is the lower threshold, and from equal to greater is the upper threshold, The difference between these two values is called theinterval of uncertainty, and the difference threshold is found by dividing the interval of uncertainty by 2. The midpoint of this interval is the point of subjective equality, and the difference between this point and the true value of the standard stimulus reflects the constant error, or the influence of any factors that cause the observer to overestimate or underestimate

SENSATION AND PERCEPTION 63

Arbitrary absolute threshold (50%

‘‘yes’’ response)

0 0 50 100

1 2 3 4

Units of intensity of stimulus

Percent detections (‘‘yes” responses)

5 6 7

Figure 2 Typical S-shaped psychophysical function obtained with the method of constant stimuli. The absolute threshold is the stimulus intensity estimated to be detected 50% of the time. (From Schiffman, 1996.)

systematically the value of the comparison in relation to that of the standard.

When the method of constant stimuli is used to obtain difference thresholds, the order in which the standard and comparison are presented is varied, and the observer judges which stimulus is greater than the other. The basic data then are the percentages of “greater” responses for each value of the comparison stimulus. The stimulus value corresponding to the 50th percentile is taken as the point of subjective equality. The difference between that stimulus value and the one corresponding to the 25th percentile is taken as the lower difference threshold, and the difference between the subjectively equal value and the stimulus value corresponding to the 75th percentile is the upper threshold: The two values are averaged to get a single estimate of the difference threshold.

Although threshold methods are often used to inves- tigate basic sensory processes, variants can be used to investigate applied problems as well. Shang and Bishop (2000) argued that the concept of visual threshold is of value for measuring and monitoring landscape attributes.

They measured three types of different thresholds—

detection, recognition, and visual impact (changes in visual quality as a consequence of landscape modification)—for two types of objects, a transmission tower and an oil refinery tank, as a function of size, con- trast, and landscape type. Shang and Bishop were able to obtain thresholds of high reliability and concluded that a visual variable that combined the effects of contrast and size, which they calledcontrast weighted visual size, was the best predictor of all three thresholds.

Signal Detection Methods Although many vari- ants of the classical methods are still used, they are not

as popular as they once were. The primary reason is that the threshold measures confound perceptual sen- sitivity, which they are intended to measure, with response criterion or bias (e.g., willingness to say yes), which they are not intended to measure. The thresh- old estimates can also be influenced by numerous other extraneous factors, although the impact of most of these factors can be minimized with appropriate control pro- cedures. Alternatives to the classical methods, signal detection methods, have come to be preferred in many situations because they contain the means for separating sensitivity and response bias. Authoritative references for signal detection methods and theory include Green and Swets (1966), Macmillan and Creelman (2005), and Wickens (2001). Macmillan (2002) provides a briefer introduction to its principles and assumptions.

The typical signal detection experiment differs from the typical threshold experiment in that only a single stimulus value is presented for a series of trials, and the observer must discriminate trials on which the stimulus was not presented (noise trials) from trials on which it was (signal-plus-noise, or signal, trials). Thus, the signal detection experiment is much like a true–false test in that it is objective; the accuracy of the observer’s responses with respect to the state of the world can be determined. If the observer says yes most of the time on signal trials and no most of the time on noise trials, we know that the observer was able to discriminate between the two states of the world. If, on the other hand, the proportion of yes responses is equal on signal and noise trials, we know that the observer could not discriminate between them. Similarly, we can determine whether the observer has a bias to say one response or the other by considering the relative frequencies of yes and no responses regardless of the state of the world. If half of the trials included the signal and half did not, yet the observer said yes 70% of the time, we know that the observer had a bias to say yes.

Signal detection methods allow two basic measures to be computed, one corresponding to discriminability (or sensitivity) and the other to response bias. Thus, the key advantage of the signal detection methods is that they allow the extraction of a pure measure of perceptual sensitivity separate from any response bias that exists, rather than combining the two in a single measure, as in the threshold techniques. There are many alternative measures of sensitivity (Swets, 1986) and bias (Macmillan and Creelman, 1990), based on a variety of psychophysical models and assumptions. We will base our discussion around signal detection theory and the two most widely used measures of sensitivity and bias,dandβ. Sorkin (1999) describes how signal detection measures can be calculated using spreadsheet application programs such as Excel.

Signal detection theory assumes that the sensory effect of a signal or noise presentation on any given trial can be characterized as a point along a continuum of evidence indicating that the signal was in fact presented. Across trials, the evidence will vary, such that for either type of trial it will sometimes be higher (or lower) than at other times. For computation ofd and β, it is assumed that the resulting distribution of

Criterion

Noise

μn

μs + n

Hits Signal + noise

False alarms d1

d2

d = d1 + d2

Figure 3 Equal-variance, normal probability distribu- tions for noise and signal-plus-noise distributions on sensory continuum, with depiction of proportion of false alarms, proportion of hits, and computation ofd’. Bottom panel shows both distributions on a single continuum.

(From Proctor and Van Zandt, 2008.)

values is normal (i.e., bell shaped and symmetric), or Gaussian, for both the signal and noise trials and that the variances for the two distributions are equal (see Figure 3). To the extent that the signal is discriminable from the noise, the distribution for the signal trials should be shifted to the right (i.e., higher on the con- tinuum of evidence values) relative to that for the noise trials. The measuredis therefore the distance between the means of the signal and noise distributions, in standard deviation units. That is,

d= μsμn σ

whereμs is the mean of the signal distribution, μn is the mean of the noise distribution, andσ is the standard deviation of both distributions. The assumption is that the observer will respond yes whenever the evidence value on any trial exceeds a criterion. The measure of β, which is expressed by the formula

β= fs(C) fn(C)

whereC is the criterion andfs and fn are the heights of the signal and noise distributions, respectively, is the likelihood ratio for the two distributions at the criterion.

It indicates the placement of this criterion with respect

to the distributions and thus reflects the relative bias to respond yes or no.

Computation of d and β is relatively straightfor- ward. The placement of the distributions with respect to the criterion can be determined as follows. Thehit rate is the proportion of signal trials on which the observer correctly said yes; this can be depicted graphically by placing the criterion with respect to the signal distribu- tion so that the proportion of the distribution exceeding it corresponds to the hit rate. The false-alarm rate is the proportion of noise trials on which the observer incorrectly said yes. This corresponds to the propor- tion of the noise distribution that exceeds the criterion;

when the noise distribution is placed so that the pro- portion exceeding the criterion is the false-alarm rate, relative positions of the signal and noise distributions are depicted. Sensitivity, as measured by d, is the difference between the means of the signal and noise distributions, and this difference can be found by sep- arately calculating the distance of the criterion from each of the respective means and then combining those two distances. Computationally, this involves converting the false-alarm rate and hit rate into standard normal z scores. If the criterion is located between the two means,dis the sum of the twozscores. If the criterion is located outside that range, the smaller of the twoz scores must be subtracted from the larger to obtaind. The likelihood ratio measure of bias,β, can be found from the hit and false-alarm rates by using aztable that specifies the height of the distribution for eachz value.

Whenβis 1.0, no bias exists to give one or the other response. A value ofβgreater than 1.0 indicates a bias to respond no, whereas a bias less than 1.0 indicates a bias to respond yes.

Althoughβhas been used most often as the measure of bias to accompany d, several investigations have indicated that an alternative bias measure,C, is better (Snodgrass and Corwin, 1988; Macmillan and Creelman, 1990; Corwin, 1994), whereC is a measure of criterion location rather than likelihood ratio. Specifically,

C = −0.5[z(H)+z(F)]

whereH is the hit rate andF the false-alarm rate. Here C is superior toβon several grounds, including that it is less affected by the level of accuracy than isβ and will yield a meaningful measure of bias when accuracy is near chance.

For a givend, the possible combinations of hit rates and false-alarm rates that the observer could produce through adopting different criteria can be depicted in a receiver operating characteristic (ROC) curve (see Figure 4). The farther an ROC is from the diagonal that extends from hit and false-alarm rates of 0–1, which represents chance performance (i.e.,dof 0), the greater the sensitivity. The procedure described above yields only a single point on the ROC, but in many cases it is advantageous to examine performance under several criteria settings, so that the form of the complete ROC is evident (Swets, 1986). One advantage is that the estimate of sensitivity will be more reliable when it is based on several points along the ROC than when it is based on

SENSATION AND PERCEPTION 65

0.1 0.1

0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Proportion of false alarms

Zero sensitivity Proportion of hits

Moderately sensitive observer Very

sensitive observer

Figure 4 ROC curves showing the possible hit and false- alarm rates for different sensitivities. (From Proctor and Van Zandt, 2008.)

only one. Another is that the empirical ROC can be compared to the ROC implied by the psychophysical model that underlies a particular measure of sensitivity to determine whether serious deviations occur. For example, when enough points are obtained to estimate complete ROC curves, it is possible to evaluate the assumptions of equal-variance, normal distributions on which the measures ofdandβare based. When plotted on z-score coordinates, the ROC curve will be linear with a slope of 1.0 if both assumptions are supported;

deviations from a slope of 1.0 mean that one distribution is more variable than the other, whereas systematic deviations from linearity indicate that assumption of normality is violated. If either of these deviations is present, alternative measures of sensitivity and bias that do not rely on the assumptions of normality and equal variance should be used.

For cases in which a complete ROC curve is desired, several procedures exist for varying response criteria.

The relative payoff structure may be varied across blocks of trials to make one or the other response more preferable; similarly, instructions may be varied regarding how the observer is to respond when uncertain.

Another way to vary response criteria is to manipulate the relative probabilities of the signal-and-noise trials;

the response criterion should be conservative when signal trials are rare and become increasingly more liberal as the signal trials become increasingly more likely. One of the most efficient techniques is to use rating scales (e.g., from 1, meaning very sure that the signal was not present, to 5, meaning very sure that it was present) rather than yes–no responses. The ratings are then treated as a series of criteria, ranging from high to low, and hit and false-alarm rates are calculated with respect to each. Eng (2006) provides an online program

for plotting an ROC curve and calculating summary statistics.

Signal detection methods are powerful tools for investigating basic and applied problems pertaining not only to sensation and perception but also to many other areas in which an observer’s response must be based on probabilistic information, such as distinguishing normal from abnormal X rays (Manning and Leach, 2002) or detecting whether severe weather will occur within the next hour (Harvey, 2003). Although most work on signal detection theory has involved discriminations along a single psychological continuum, it has been extended also to situations in which multidimensional stimuli are presumed to produce values on multiple psychological continua such as color and shape (e.g., Macmillan, 2002). Such analyses have the benefit of allowing evaluation of whether the stimulus dimensions are processed in perceptually separable and independent manners and whether the decisions for each dimension are also separable. As these examples illustrate, signal detection methods can be extremely effective when used with discretion.

2.2.2 Psychophysical Scaling

Another concern in psychophysics is to construct scales for the relation between physical intensity and per- ceived magnitude (see Marks and Gescheider, 2002, for a review). One way to build such scales is to do so from discriminative responses to stimuli that dif- fer only slightly. Fechner (1860) established procedures for constructing psychophysical scales from difference thresholds. Later, Thurstone (1927) proposed a method for constructing a scale from paired comparison proce- dures in which each stimulus is compared to all others.

Thurstonian scaling methods can even be used for com- plex stimuli for which physical values are not known.

Work on scaling in this tradition continues to this day in what is called Fechnarian multidimensional scaling (Dzhafarov and Colonius, 2005), which “borrows from Fechner the fundamental idea of computing subjective dissimilarities among stimuli from the observers’ ability to tell apart very similar stimuli” (p. 3).

An alternative way to construct scales is to use direct methods that require some type of magnitude judgment (see, e.g., Bolanowski and Gescheider, 1991, for an overview). Stevens (1975) established methods for obtaining direct magnitude judgments. The technique of magnitude estimationis the most widely used. With this procedure, the observer is either presented a standard stimulus and told that its sensation is a particular numerical value (modulus) or allowed to choose his or her own modulus. Stimuli of different magnitudes are then presented randomly, and the observer is to assign values to them proportional to their perceived magnitudes. These values then provide a direct scale relating physical magnitude to perceived magnitude. A technique calledmagnitude productioncan also be used, in which the observer is instructed to adjust the value of a stimulus to be a particular magnitude. Variations of magnitude estimation and production have been used to measure such things as emotional stress (Holmes and Rahe, 1967) and pleasantness of voice quality for normal

Dalam dokumen HANDBOOK OF HUMAN FACTORS AND ERGONOMICS (Halaman 77-82)