SENSATION AND PERCEPTION
4.2 Spatial Orientation
We live in a three-dimensional world and hence must be able to perceive locations in space relatively accurately
SENSATION AND PERCEPTION 85 if we are to survive. Many sources of information
come into play in the perception of distance and spatial relations (Proffitt and Caudek, 2003), and the consensus view is that the perceptual system constructs the three-dimensional representation using this information as cues.
4.2.1 Visual Depth Perception
Vision is a strongly spatial sense and provides us with the most accurate information regarding spatial location. In fact, when visual cues regarding location conflict with those from the other senses, the visual sense typically wins out, a phenomenon called visual dominance. There are several areas of human factors in which we need to be concerned about visual depth cues. For example, accurate depth cues are crucial for situations in which navigation in the environment is required; misleading depth cues at a landing strip at an airfield may cause a pilot to land short of the runway.
For another, a helmet-mounted display, viewed through a monocle, will eliminate binocular cues and possibly provide information that conflicts with that seen by the other eye. As a final example, it may be desired that a simulator depict three-dimensional relations relatively accurately on a two-dimensional display screen.
One distinction that can be made is between ocu- lomotor cues and visual cues. The oculomotor cues are accommodation and vergence angle, both of which we discussed earlier in the chapter. At relatively close distances, vergence and accommodation will vary systematically as a function of the distance of the fixated object from the observer. Therefore, either the signal sent from the brain to control accommodation and vergence angle or feedback from the muscles could provide cues to depth. However, Proffitt and Caudek (2003) conclude that neither oculomotor cue is a particularly effective cue for perceiving absolute depth and both are easily overridden when other depth cues are available.
Visual cues can be partitioned into binocular and monocular cues. The binocular cue isretinal disparity, which arises from the fact that the two eyes view an object from different locations. An object that is fixated falls on corresponding points of the retinas. This object can be regarded as being located on an imaginary curved plane, called the horopter; any other object that is located on this plane will also fall on corresponding points. For objects that are not on the horopter, the images will fall on disparate locations of the retinas.
The direction of disparity, uncrossed or crossed (i.e., whether the image from the right eye is located to the right or left of the image from the left eye), is a function of whether the object is in back of or in front of the horopter, respectively, and the magnitude of disparity is a function of how far the object is from the horopter.
Thus, retinal disparity provides information with regard to the locations of objects in space with respect to the surface that is being fixated.
The first location in the visual pathway at which neu- rons are sensitive to disparity differences is the primary visual cortex. However, Parker (2007) emphasizes that
“generation of a full, stereoscopic depth percept is a
multi-stage process that involves both dorsal and ventral cortical pathways. . .. Both pathways may contribute to perceptual judgements about stereo depth, depending on the task presented to the visual system” (p. 389).
Retinal disparity is a strong cue to depth, as witnessed by the effectiveness of three-dimensional (3D) movies and stereoscopic static pictures, which are created by presenting slightly different images to the two eyes to create disparity cues. Anyone who has seen any of the recent spate of 3D movies realizes how compelling these effects can be. They are sufficiently strong that 3D is now being incorporated into home television and entertainment systems. In addition to enhancing the perception of depth relations in displays of naturalistic scenes, stereoptic displays may be of value in assisting scientists and others in evaluating multidimensional data sets. Wickens et al. (1994) found that a three-dimensional data set could be processed faster and more accurately to answer questions that required integration of the information if the display was stereoptic than if it was not.
The fundamental problem for theories of stereopsis is that of matching. Disparity can be computed only after corresponding features at the two eyes have been identified. When viewing the natural world, each eye receives the information necessary to perceive contours and identify objects, and stereopsis could occur after monocular form recognition. However, one of the more striking findings of the past 40 years is that there do not have to be contours present in the images seen by the individual eyes in order to perceive objects in three dimensions. This phenomenon was discovered by Julesz (1971), who used random-dot stereograms in which a region of dot densities is shifted slightly in one image relative to the other. Although a form cannot be seen if only one of the two images is viewed, when each of the two images is presented to the respective eyes, a three-dimensional form emerges.
Random-dot stereograms have been popularized recently through figures that utilize the autostereogram variation of this technique, in which the disparity information is incorporated in a single, two-dimensional display.
That stereopsis can occur with random-dot stereograms suggests that matching of the two images can be based on dot densities.
There are many static, or pictorial, monocular cues to depth. These cues are such that people with only one eye and those who lack the ability to detect disparity differences are still able to interact with the world with relatively little loss in accuracy. The monocular cues include retinal size (i.e., larger images appear to be closer) and familiar size (e.g., a small image of a car provides a cue that the car is far away). The cue of interposition refers to an object that appears to block part of the image of another object located in front of it. Although interposition provides information that one object is nearer than another, it does not provide information about how far apart they are. Another cue comes from shading. Because light sources typically project from above, as with the sun, the location of a shadow provides a cue to depth relations. A darker shading at the bottom of a region implies that the region
is elevated, whereas one at the top of a region provides a cue that it is depressed. Aerial perspective refers to blue coloration, which appears for objects that are far away, such as is seen when viewing a mountain at a distance. Finally, the cue of linear perspective occurs when parallel lines receding into the distance, such as train tracks, converge to a point in the image.
Gibson (1950) emphasized the importance of texture gradient, which is a combination of linear perspective and relative size, in depth perception. If one looks at a textured surface such as a brick walkway, the parts of the surface (i.e., the bricks) become smaller and more densely packed in the image as they recede into the distance. The rate of this change is a function of the orientation of the surface in depth with respect to the line of sight. This texture change specifies distance on the surface, and an image of a constant size will be perceived to come from a larger object that is farther away if it occludes a larger part of the texture. Certain color gradients, such as a gradual change from red to gray, provide effective cues to depth as well (Truscianko et al., 1991).
For a stationary observer, there are plenty of cues to depth. However, cues become even richer once the observer is allowed to move. When you maintain fixation on an object and change locations, as when looking out a train window, objects in the background will move in the same direction in the image as you are moving, whereas objects in the foreground will move in the opposite direction. This cue is called motion parallax. When you move straight ahead, the optical flow pattern conveys information about how fast your position is changing with respect to objects in the environment. There are also numerous ways in which displays with motion can generate depth perception (Braunstein, 1976).
Of particular concern for human factors is how the various depth cues are integrated. Bruno and Cutting (1988) varied the presence or absence of four cues: relative size, height in the projection plane, interposition, and motion parallax. They found that the four cues combined additively in one direct and two indirect scaling tasks. That is, each cue supported depth perception, and the more cues that were present, the more depth was revealed. Bruno and Cutting interpreted these results as suggesting that a separate module processes each source of depth information. Landy et al.
(1995) have developed a detailed model of this general nature, according to which interactions among depth cues occur for the purpose of establishing for each cue a map of absolute depth throughout the scene. The estimate of depth at each location is determined by taking a weighted average of the estimates provided by the individual cues.
Because the size of the retinal image of an object varies as a function of the distance of the object from the observer, perception of size is intimately related to perception of distance. When accurate depth cues are present, good size constancy results. That is, the perceived size of the object does not vary as a function of the changes in retinal image size that accompany changes in depth. One implication of this view is
Figure 17 Ponzo illusion. The top circle appears larger than the lower circle, due to the linear perspective cue.
that size and shape constancy will break down and illusions appear when depth cues are erroneous. There are numerous illusions of size, such as the Ponzo illusion (see Figure 17), in which one of two stimuli of equal physical size appears larger than another, due at least in part to misleading depth cues. Misperceptions of size and distance also can arise when depth cues are minimal, as when flying at night.
4.2.2 Sound Localization
The cues for sound localization on the horizontal dimension involve disparities at the two ears, much as disparities of the images at the two eyes are cues to depth. Two different sources of information, interaural intensity and time differences, have been identified (Yost, 2010). Both of these cues vary systematically with respect to the position of the sound relative to the listener. At the front and back of the listener, the intensity of the sound and the time at which it reaches the ears will be equal. As the position of the sound along the azimuth (i.e., relative to the listener’s head) is moved progressively toward one side or the other, the sound will become increasingly louder at the ear closest to it relative to the ear on the opposite side, and it also will reach the ipsilateral ear first. The interaural intensity differences are due primarily to a sound shadow created by the head. Because the head produces no shadow for frequencies less than 1000 Hz, the intensity cue is most effective for relatively high frequency tones. In contrast, interaural time differences are most effective for low- frequency sounds. Localization accuracy is poorest for tones between 1200 and 2000 Hz, because neither the intensity nor time cue is very effective in this intermediate-frequency range (Yost, 2010).
Both the interaural intensity and time difference cues are ambiguous because the same values can be produced by stimuli in more than one location. To locate sounds in the vertical plane and to distinguish whether the sound
SENSATION AND PERCEPTION 87 is in front of or behind the listener, spectral alterations in
the sound wave caused by the outer ears, head, and body (collectively called a head-related transfer function) must be relied on. Because these cues vary mainly for frequencies above 6000 Hz (Yost, 2010), front–back and vertical-location confusions of brief sounds will often occur. Confusions are relatively rare in the natural world because head movements and reflections of sound make the cues less ambiguous than they are in the typical localization experiment (e.g., Guski, 1990; Makous and Middlebrooks. 1990). As with vision, misleading cues can cause erroneous localization of sounds. Caelli and Porter (1980) illustrated this point by having listeners in a car judge the direction from which a siren occurred.
Localization accuracy was particularly poor when all but one window were rolled up, which would alter the normal relation between direction and the cues.
4.3 Eye Movements and Motion Perception