Psycholinguistics Meets Speech Physiology

LANGUAGE PRODUCTION

3. VOT AND THE VOICING DISTINCTION IN SPEECH DISORDERS

3.1. Psycholinguistics Meets Speech Physiology

If there is a transparent relationship between VOT and phonological voicing status for stop consonants, we should ask if a similar, straightforward mapping exists for the under-lying physiology of the voicing distinction. There is no a priori reason to prefer a map-ping between an acoustic, as compared to physiological characteristic and a phonological category. In fact, for either the so-called translation theories or gesture theories of speech production (for reviews of such models see Löfquist, 1997; Fowler, 1985) it is easy to for-mulate an argument for a reasonable mapping of a physiological event or small array of events to at least some of the phonological contrasts of a language. The voicing distinc-tion for stops is just such a case, where the underlying physiology is well understood for normal speakers and can be used to evaluate the logic of the inferences discussed above.

Figure 2 presents a schematic summary of the physiology of the voicing distinc-tion for English stops. The figure is based on work published by Löfqvist (1980) and Yoshioka, Löfqvist, and Hirose (1981), among others. The top two panels show

laryngeal – supralaryngeal events for intervocalic, prestressed stop consonants, as in common carrier-phrase utterances such as ‘Say tot again’ (upper left panel) and ‘Say dot again’ (upper right panel). The time history labeled Ag shows the opening and closing of the vocal folds throughout the final half of the vocalic /ei/ in the word

‘Say’, the closure interval and release phase of the stop consonant (/t/ in the left panel, /d/ in the right), and the vowel /ɑ/ in ‘tot’ (left) and ‘dot’ (right). Observations of the opening and closing of the vocal folds can be made in a number of ways but here we will assume they have been recorded more or less directly, using a fiberscope inserted through the nose with images sampled at a sufficient rate to permit viewing of the very rapid opening and closing phases of vocal fold motion during the voiced segments of speech (e.g., see Mergell, Herzel, & Titze, 2000; Wittenberg, 1997). Immediately below the Ag functions are event histories labeled Timing of CI, where CI = closure interval. Stop consonants are produced, in part, by creating a brief, complete obstruc-tion in the vocal tract to the egressive air stream from the lungs. The interval over which the obstruction is maintained, usually no more than 100 ms in the kinds of car-rier-phrase utterances illustrated here, is called the closure interval. In the case of /t/

and /d/ it begins when the tongue tip/blade makes firm contact with the alveolar ridge, and ends when this contact is released. In the schematic diagram the Timing of CI functions are shown as raised boxes extending across some unspecified time interval, Figure 2. Schematic drawings of laryngeal and supralaryngeal behavior for voiceless and voiced stops in English. Intervocalic, prestressed stops are shown in upper two panels, utterance initial stops in lower two panels. Ag = glottal area function, Timing of CI = onset and offset (stop release) of stop closure interval. See text for details.

VOT

Voicing Onset Utterance-Initial Voiceless Stop Intervocalic, Prestressed Voiceless Stop

Timing of CI

VOT

Onset of Supraglottal

Closure

Voicing Onset

VOT

Voicing Onset Utterance-Initial Voiced Stop

VOT

Voicing Onset

Intervocalic, Prestressed Voiced Stop

Timing of CI

where the onset (beginning) and offset (release) of the closure interval are the important events for the current discussion. The Ag function, reflecting a laryngeal event, and the CI function, reflecting a supralaryngeal event, are shown on the same time scale. Note also the identical durations of the closure interval for the voiceless (upper left) and voiced (upper right) stops, consistent with previously published work on stop closure durations in the intervocalic, prestressed position (Umeda, 1977; Stathopoulos & Weismer, 1983).

Considering the voiceless stop first, the relatively rapid openings and closings of the glottis at the beginning of the Ag function are due to the motions of the vibrating vocal folds for the vocalic /ei/ preceding the stop closure. These motions, which produce phonation (voicing), are the result of aerodynamic and mechanical forces (see Broad, 1979, for an excellent review) and are essentially periodic at typical rates of approximately 120 Hz (period ⫽ 8.33 ms) for adult men and 200 Hz (period ⫽ 5 ms) for adult women.⁸In the schematic, after six cycles of these nearly periodic motions there is a relatively long opening and closing gesture of the vocal folds. This is the laryngeal devoicing gesture (LDG), and it differs from the opening and closing gestures of phonation in some obvious and not-so-obvious ways. The most obvious difference is in the much-longer duration of the LDG, which is typically between about 100 and 150 ms. A less obvious difference, but one critical to the current discussion, is that the opening and closing motions of the LDG are under muscular control (Hirose, 1976), and do not result from aerodynamic and mechanical forces as in the case of phonatory behavior. For this reason the LDG is properly referred to as an articulatory gesture of the larynx, suggesting its essential role in the segmental characteristics of not only voiceless stop consonants, but also voiceless fricatives (such as /fɵsʃ/) and the voiceless affricate (/tʃ/). With some minor differences, the LDG is basi-cally the same for all of the voiceless obstruents. Note also the synchrony, indicated in the figure by upward pointing arrows, of the opening gesture of the LDG with the onset of the supralaryngeal closure for the /t/; this time-locking of the laryngeal and supralaryngeal gestures also holds true for fricatives and affricates. Finally, the release of the closure for /t/ (downward-pointing arrow) occurs well before the LDG is completed, resulting in a relatively substantial interval between the stop closure release and the onset of phonatory behavior – marked in the figure as ‘voicing onset’ – for the following vowel. As marked by the horizontal line ending in arrowheads, this is the VOT interval.

An important aspect of the LDG is its continuous and stereotyped nature. Once initi-ated, the gesture seems to evolve over time without the kind of conscious control over its individual parts that might characterize many other movements. Löfqvist (1980) referred to the LDG as a ballistic gesture, a term used in the speech production literature to suggest an articulatory motion that is not adjusted after its initiation but rather runs a stereotyped course; there are experimental data consistent with this idea, suggesting that

8Vocal fold vibration is commonly described as “quasiperiodic” to indicate that the motions and their acoustic results are not perfectly periodic; the typical values of vibratory rate given for adult men and women are sub-ject to a host of variables other than gender, such as age, health status, emotional state, type of speech material, and personal history (e.g., history of smoking or taking certain drugs).

speakers do not have much control over the scaling (size) of the LDG, even when given the best opportunity to change it (see Löfqvist, Baer, & Yoshioka, 1981). The ballistic nature of the LDG is an important piece of the argument that its presence signifies the phonological voicelessness of a stop consonant, and its absence the opposite; here the binary nature of the linguistic contrast is captured well by the underlying physiology.

The top right panel of Figure 2 shows the laryngeal and supralaryngeal events for the voiced /d/ in ‘dot’. The phonatory motions for the preceding /ei/ are, not surprisingly, the same as those preceding the /t/,⁹but at the onset of the closure interval for /d/ the peri-odic motions continue, gradually declining in amplitude until they disappear shortly before release of the stop. In some cases of voiced stops such as the /d/ shown here motions of the vocal folds may continue throughout the closure; the presence of those motions during the closure interval, or how rapidly they cease after the onset of closure, depends on the time history of the pressure difference across the vibrating vocal folds.

The VOT interval is clearly much shorter for /d/, as compared to /t/, and these typical short-lag VOTs for voiced stops will occur whether or not voicing continues throughout the closure interval or is terminated before the end of the closure interval, as shown in Figure 2. What is important here, and critical to the issue of inferring phonological status from empirically measured VOT values, is that this underlying physiology can be boiled down to a fairly clear dichotomy: voiceless stops have an LDG, voiced stops do not.

When the LDG is present, at least for intervocalic, prestressed stops, VOTs will be in the long-lag range simply because the supralaryngeal closure interval is released about half-way into the LDG, when the vocal folds are maximally open (Löfqvist, 1980). This is so because it takes time to move the vocal folds back to the midline position where phona-tion can resume, and if the closure interval is released near the middle of the LDG or even slightly later, the VOT must be in the long-lag range. Conversely, when there is no LDG, as is the case for the upper right panel of Figure 2, periodic motions of the vocal folds can resume shortly after the release of the closure interval. Thus, in an English speaker with no neurological disease, the presence of the LDG can be assumed to be part of the voiceless specification for stops, and its absence part of its voiced specification.

Presumably, a correct phonological representation of a /t/ will trigger the LDG as part of its specification for articulatory control; a /d/ will not. Note the bottom panels of Figure 2, where the typical physiology is illustrated for /t/ (bottom left) and /d/ (bottom right) in utterance-initial position, as would be the case for isolated words with initial stops.

Even though there is no voiced segment preceding the /t/ an LDG is observed. For utterance-initial /d/, especially in English, there are typically no vocal fold vibrations during the closure interval (i.e., utterance-initial voiced stops are typically devoiced in English), but the vocal folds are held in the midline position during the closure and so begin to vibrate shortly following the stop release. The binary physiological opposition

9Like most generalizations of this sort, there are some qualifications; as vocal fold motions approach the closure interval of a stop consonant, details of the vibration are somewhat different for voiceless vs. voiced stops; see Ní Chasaide and Gobl (1997) for a summary of these effects.

of presence vs. absence of the LDG applies equally to utterance-initial stops, as it does to intervocalic stops, as important point as some experiments use isolated words to study the voicing opposition in aphasia or other neurogenic speech disorders.

So far, this explanation of the physiology of voiceless and voiced stop production and its relationship to the phonological voicing status of stops seems consistent with the inverted logic of inferring voicing status from VOT values. The critical question is, however, can an underlying physiological scenario be imagined wherein the LDG was present but the measured VOT was clearly in the short-lag range? If so, the logic of in-ferring phonological voicing status from VOT values runs into serious problems, because it makes little sense to label a stop as phonologically voiced when it could have an LDG;

after all, the LDG is a physiological implementation of voicelessness. In fact, physiolog-ical data consistent with the scenario of production of an LDG with measured VOTs in the short-lag range have been described in the literature. For example, in languages hav-ing a phonemic distinction between unaspirated and aspirated voiceless stops (e.g., French and Mandarin) or in cases where different degrees of stop aspiration are elicited by varying position-in-word or syllable stress level (e.g., Swedish), fiberscopic observa-tions have shown the presence of the LDG in all cases, but modificaobserva-tions of timing or size (i.e., maximum opening of the glottis) of the LDG and/or of the stop closure duration result in the small amounts of aspiration – that is, a short VOT (see, e.g., Iwata & Hirose, 1976; Benguerel, Hirose, Sawashima, & Ushijima, 1978; Löfqvist, 1980).

More specifically, the underlying physiology of voicelessness for stops clearly indi-cates how a patient might implement the correct and crucial articulatory feature – the LDG – yet still produce a VOT in the short-lag range. First, variations in duration of the closure interval, with magnitude of the LDG opening and its synchrony with onset of supralaryngeal closure held constant, will result in variations in VOT; longer closure intervals will produce shorter VOTs, and vice versa (see Löfqvist, 1980; Weismer, 1980).

Longer stop closure intervals would certainly be consistent with the generally slower speaking rates observed in patients with anterior lesions (e.g., Kent & Rosenbek, 1983;

Baum et al., 1990), and could possibly explain the occurrence of some short-lag VOTs for voiceless stop targets as reported by Blumstein et al. (1977, 1980) and Baum et al.

(1990). Second, the LDG could be implemented but with smaller-than-normal magnitude – perhaps one of the subtle phonetic deficits hypothesized for patients with posterior lesions by Baum et al. (1990) and Kurowski, Blumstein, and Mathison (1998) – which would also result in shorter VOTs. Third, asynchronies between the onsets of an LDG of normal magnitude, and a supralaryngeal closure interval of normal duration, would also produce variations in VOT. An LDG onset that lags onset of a supralaryngeal closure interval will result in relatively longer VOTs, and vice versa. Coordination problems involving two or more articulators are often cited as a prominent problem in anterior aphasics and persons with apraxia of speech (Baum et al., 1990; Itoh, Sasanuma, &

Ushijima, 1979) and could be part of the subtle phonetic deficit proposed for posterior aphasics. It is hardly far-fetched to imagine that such asynchronies might characterize a neurologically based speech production problem (see, e.g., Kent & Adams, 1989; and review in Weismer, Yunusova, & Westbury, 2003).

In the hypothetical cases described here, the presence of the LDG must be interpreted as good evidence of phonological voicelessness. Yet in each of case a VOT in the short-lag region is shown to be a possibility, and that possibility has been interpreted at times as a voiced-for-voiceless error of the phonological kind. These hypothetical cases, however, are physiologically plausible and are clearly phonetic anomalies. For these reasons, when a voiceless stop target is produced with a short-lag VOT it seems ill-advised to interpret the event as evidence of a phonological error, at least in the absence of additional evidence.

The best form of such evidence would be direct viewing of the larynx during the produc-tion of voiceless stop targets, but of course this is not feasible in most clinical settings and indeed, is a specialty type of data even in research venues. Fortunately, acoustic measures and observations can be added to VOT measures to clarify the underlying physiology associated with an apparent voiced-for-voiceless substitution. For example, the possibility of a lengthened closure interval contributing to a short-lag VOT when an LDG is produced can be addressed by combining measures of the voiceless interval with measures of closure duration. Figure 3 illustrates the voiceless interval in a sequence like the one shown in Figure 2, where /ei/ precedes a voiceless stop, which is then followed by an /ɑ/

(as in the ‘Say tot again’). The voiceless interval is measured from the final glottal pulse preceding the voiceless stop closure (leftmost, solid, upward-pointing arrow) to the first glottal pulse following that closure (rightmost, solid, upward pointing arrow); as such, the

VOT 1 VOT 2 VOT 3

VOICELESS INTERVAL Ag

Timing of CI

Figure 3. Schematic drawing of laryngeal and supralaryngeal behavior for voiceless stops, show-ing the effects of variations in stop closure duration and magnitude of the LDG on VOT measurements. All VOT intervals shown on the figure are taken between the release of the stop (the end of the indicated closure interval) and the first glottal pulse of the following vowel. See text for additional detail.

voiceless interval duration is the sum of the closure interval duration plus the VOT.

Moreover, the interval should correspond rather closely to the duration of the LDG. If a lengthened closure duration is contributing significantly to the measurement of short-lag VOTs, the voiceless interval duration should be close to normal (somewhere between 100 and150 ms) and the closure duration longer than normal (>100 ms).¹⁰Measured intervals like these would likely yield a short-lag VOT but would not be consistent with an interpretation of a phonological error. A comparison of VOT 1 to VOT 2 in Figure 3 shows clearly the effect of a lengthened stop closure interval (indicated by the dashed-line extension of the original closure interval) on a measured VOT, even with a normal LDG.

Figure 3 also shows how acoustic measures could suggest the presence of a smaller-than-normal LDG and its role in yielding a short-lag VOT. The reduced LDG and the vocal fold vibrations following it are illustrated by the dotted-line Ag function; note that the onset of this LDG is still synchronized with the onset of the closure interval. Because, the LDG has a smaller-than-normal magnitude it is completed in a shorter amount of time and vocal fold vibration resumes earlier than in the case of the normal-sized gesture. With a normal closure duration (the solid-line closure interval), a short-lag VOT (VOT 3 in Figure 3) will result, but should not be taken as evidence of a phonological error. Here, the relevant acoustic meas-ures would be the shorter-than-normal voiceless interval (indicated in Figure 3 by the inter-val between the solid and dotted upward-pointing arrows) and a normal closure duration.

The effects of asynchrony between the onsets of the LDG and closure interval are illustrated in Figure 4. In Example 1, the two events are synchronized, producing the expected long-lag VOT. The asynchrony in Example 2 has the onset of the closure inter-val lagging the LDG, which could result in a short-lag VOT. This asynchrony should also result in preaspiration, a relatively substantial interval of aperiodic energy prior to the stop closure, produced as a result of the vocal folds opening for the LDG while the vocal tract is still open. In normal, young adult speakers, preaspiration of voiceless stops occurs only occasionally, and then rarely for more than 10–15 ms; in speakers with neurogenic speech disorders preaspiration almost certainly occurs more frequently than in normal speakers, and often has durations exceeding 30 ms.¹¹The presence and duration of preaspiration, plus measurement of a close-to-normal voiceless interval duration, would suggest that a short-lag VOT had the underlying physiology of voicelessness and therefore should not be considered as evidence of a phonological error. The opposite asynchrony, with the closure interval leading the LDG, is shown as Example 3. In this case the VOT would have an exaggerated, long-lag value; the acoustic evidence of such asynchrony would be the presence of glottal vibrations extending into the closure interval for more than approxi-mately 20 ms, a value rarely exceeded in normal speakers (Weismer, 1997, 2004).

10Stop closure durations ⬎100 ms are very unusual in normal speakers, and especially in the so-called

‘connected’ speech samples such as passage reading and spontaneous speech. In more formal types of speech, and especially in citation forms of speech (single words or words in brief carrier phrases), closure durations for bilabial stops may approach 100 ms.

11Supporting data have not, to the author’s knowledge, been published; the claim is made based on his experi-ence in measuring acoustic records of speakers with dysarthria and aphasia.

To summarize, short-lag VOTs may be measured even when an LDG is present. In the absence of additional acoustic measurements, it does not seem wise to interpret short-lag VOTs as mapping onto phonological voicing status in a straightforward way. In papers such as Blumstein et al. (1977, 1980), and Baum et al. (1990), short-lag VOTs for voice-less stop targets have been regarded as manifestations of such phonological voicing errors; similar reasoning has been used in Ryalls, Provost, and Arsenault (1995) and Gandour and Dardarananda (1984a) for French and Thai speakers with aphasia, although the phonetic details are somewhat different.

In the case of voiced stop targets, it is substantially more difficult to demonstrate how the underlying physiology could be consistent with phonological voicing yet yield a long-lag VOT, at least in English.¹²Examples of such errors in persons with aphasia, Figure 4. Schematic drawing of laryngeal and supralaryngeal behavior for voiceless stops, show-ing the effects of asynchronies between the LDG and stop closure interval on VOT measurements.

See text for additional details.

Preaspirated

Voicing into closure Onset of LDG

Example 1

Example 2

Example 3

12When a two- or three-way opposition involves voiced and voiceless unaspirated stops, it is not at all difficult to envision the physiology of voiced stops resulting in a short-lag VOT and thus the possible interpretation of a phonemic (voiceless for voiced) error. With no LDG but vocal folds that are prevented from vibrating because the pressures below and above the glottis are the same, as often occurs toward the end of voiced stop closure interval, the vocal fold vibration may be delayed following the stop release by as much as 20 ms, resulting in the kind of VOT observed for voiceless unaspirated stops. See Ryalls et al. (1995) for examples from French and Gandour et al. (1992) for example from Thai.

apraxia of speech, and right-hemisphere damage have been reported by Blumstein et al.

(1977, 1980), Kent and Rosenbek (1983), Baum et al. (1990), and Kurowski, Blumstein, and Mathison (1998). If the vocal folds are close to, or at the midline when a stop is released, it does not seem possible for VOT to be much more than 20, or at the limit, 30 ms; a voiced target produced with a VOT of 50–60 ms would seem to require an LDG which, as argued above, signifies voicelessness. Perhaps, then, measurement of a long-lag VOT when the target stop is voiced should be taken as good evidence of a phonemic error. There is, however, one more piece of evidence that demonstrates how tenuous this interpretation might be.

Dysarthria is a neurogenic speech disorder in which damage to the central or peripheral nervous system results in a problem with control of some or many of the scores of mus-cles involved in the production of speech (see Weismer, 1997, for a review). Although the diseases that result in dysarthria may also produce cognitive problems (such as mental retardation in cerebral palsy, dementia and depression in Parkinson disease, aphasia in stroke, to name a few examples), the speech motor control problem has never been thought to be complicated by a potential loss or modification of phonological representations.

Stated otherwise, sound segment errors and their acoustic manifestations in dysarthria have always been considered of a phonetic origin, reflecting only the control problem.

Figure 5 shows data derived from conversational speech samples produced by 22 adults with dysarthria (Weismer, 2004); none of these speakers had serious cognitive problems, and all had been seen for research purposes related solely to their speech motor control deficit. This cumulative probability graph shows VOT for voiced (solid function) and voiceless (dotted function) stop consonants produced in the prestressed position. For 97 voiced stops and 112 voiceless stops, median VOTs across speakers and place of articula-tion were 16 and 49 ms, respectively, with both funcarticula-tions steeper below, as compared to above, the medians.¹³Mean Data from Lisker and Abramson’s (1964) sentence production task, marked on the cumulative probability functions with filled squares, show the current data to be roughly comparable to those from normal speakers. Speakers with dysarthria therefore tend to maintain the VOT distinction for voiced stops in much the same way as normal speakers. There are clearly exceptions to this summary statement, however, because the cumulative probability functions show examples of voiced stops with long-lag VOT values (say, above 30 ms), and voiceless stops with short-lag VOT values (below 20 ms). The low frequency of occurrence of these apparently ‘incorrect’ VOT values is prob-ably a negligible concern for the question of if and how speakers with dysarthria maintain the voicing distinction, but the evaluation of individual VOT exemplars has been taken as critical evidence in adjudicating a phonetic vs. phonological origin of stop voicing errors in aphasia. The data presented here are not unusual; VOT values clearly in the category

13Note the absence of VOT values in the ‘lead’ (negative) range for the voiced function. In connected speech, when voiced stops are in the intervocalic position vocal fold vibration can continue throughout the closure in-terval, with the duration of voicing lead equal to the duration of the closure interval. For this analysis, if voic-ing occurred at the same time as the stop release (which will be the case, give or take several milliseconds), the VOT was recorded as ‘0’ (zero).

opposed to the target have been reported for speakers with dysarthria by Caruso and Burton (1997), Lieberman et al. (1992), Ryalls Hoffman-Ruddy, Vitels, and Owens (2001), Ackerman, Graber, Hertrich, and Daum (1999), and Bunton and Weismer (2002).

The logic of identifying long-lag VOTs for voiced targets and short-lag VOTs for voice-less targets as a result of incorrect phonological representations should not depend on the type of disorder under evaluation. The fact that such large-scale phonetic anomalies also occur in dysarthria, a group of disorders in which phonological representations are assumed to be unaffected by the disease process, highlights another problem in the use of VOT values to make the distinction between phonetic and phonological errors.

Dalam dokumen HANDBOOK OF PSYCHOLINGUISTICS (Halaman 121-130)