• Tidak ada hasil yang ditemukan

A Model for Source/Tract Interaction

Dalam dokumen Discrete-Time Speech Signal Processing (Halaman 175-179)

EXERCISES

4.5 Vocal Fold/Vocal Tract Interaction

4.5.1 A Model for Source/Tract Interaction

Figure 4.22 shows an electrical analog of a model for airflow through the glottis. In this model, voltage is the analog of sound pressure, and current is the analog of volume velocity. Psgat the left is the subglottal (below the glottis) pressure in the lungs that is the power source for speech;

during voicing it is assumed fixed. p(t ) is the sound pressure corresponding to a single first formant19in front of the glottis. Zg(t )is the time-varying impedance of the glottis, defined as the ratio oftransglottal pressure(across the glottis),ptg(t ), to theglottal volume velocity (through the glottis),ug(t ), i.e.,

Zg(t ) = ptg(t )

ug(t ) (4.41)

where all quantities are complex. TheR,C, andLelements are resistance, capacitance, and inductance, respectively, of an electrical model for the first formant whose nominal frequency and (3 dB attenuation) bandwidth are given by

o =

1

LC Bo = 1

RC. (4.42)

From Figure 4.22, we see that ifZg(t )is comparable to the impedance of the first-formant model, there will be considerable “interaction” between the source and the vocal tract, in violation of the source/filter assumption of independence. This interaction will induce an effective change in nominal center frequencyoand bandwidthBoof the first formant.

Empirically, and guided by aerodynamic theories, it has been found thatZg(t )is not only time-varying, but also nonlinear. Specifically, it has been shown by van den Berg [31] that the transglottal pressure and glottal volume velocity are related by

ptg(t )

2A2(t )

u2g(t ) (4.43)

19Recall that the vocal tract transfer function has many complex pole pairs corresponding to speech formants.

Only the first formant is included in Figure 4.22, because for higher formants the impedance of the vocal tract is negligible compared to that at the glottis. Numerical simulations, to be described, initially included multiple formants of the vocal tract and multiple subglottal resonances. These simulations verify that formants above the first formant and subglottal resonances negligibly influence glottal flow [1].

4.5 Vocal Fold/Vocal Tract Interaction 155

Zg(t)

ug(t)

Psg p(t) R C L

ptg(t)

+

Figure 4.22 Diagram of a simple electrical model for vocal fold/vocal tract interaction. Only the first-formant model is included.

SOURCE: C.R. Jankowski,Fine Structure Features for Speaker Identification [10]. ©1996, C.R. Jankowski and the Massachusetts Institute of Technology.

Used by permission.

whereρis the density of air andA(t )is the smallest time-varying area of the glottal slit. (Looking down the glottis, the cross-sectional glottal area changes with depth.) The termk = 1.1 and includes the effect of a nonuniform glottis with depth. Writing the current nodal equation of the circuit in Figure 4.22, and using Equation (4.43), we can show that (Exercise 4.19)

Cdp(t )

dt + p(t ) R + 1

L t

0

p(τ )dτ = A(t )

2ptg(t )

(4.44)

where

p(t ) +ptg(t ) = Psg.

Equations (4.43) and (4.44) represent a pair of coupled nonlinear differential equations with time-varying coefficients, the time variation entering through the changing glottal slit area and the nonlinearity entering through the van den Berg glottal pressure/volume velocity relation.

One approach to simultaneously solving the above Equation pair (4.43), (4.44) is through numerical integration [1]. In this simulation, it was determined that the time-domainskewness of the glottal flow over one cycle (we saw this skewness in Chapter 3) is due in part to the time- varying glottal area function and in part to the loading by the vocal tract first formant; pressure from the vocal tract against the glottis will slow down the flow and influence its skewness. In addition to an asymmetric glottal flow, the numerical simulation also revealed an intriguing sinusoidal-like “ripple” component to the flow. The timing and amount of ripple are dependent on the configuration of the glottis during both the open and closed phases [1],[24],[25]. For example, with folds that open in a zipper-like fashion, the ripple may begin at a low level early into the glottal cycle, and then grow as the vocal folds open more completely. We can think of the ripple as part of thefine structureof the glottal flow, superimposed on the more slowly-varying asymmetriccoarse structureof the glottal flow, as illustrated in Figure 4.23, showing the two components in a schematic of a typical glottal flow derivative over one cycle. We describe the separation of coarse- and fine-structure glottal flow components in Chapter 5. Finally, the

“truncation” effect over a glottal cycle, alluded to earlier, was also observed.

0.5 0 –0.5 –1 –1.5 –2

0 20 40

Time (ms) Ripple

Amplitude

60 80

Figure 4.23Glottal flow derivative waveform showing coarse and ripple component of fine structure due to source/vocal tract interaction.

SOURCE: M.D. Plumpe, T.F. Quatieri, and D.A. Reynolds, “Mod- eling of the Glottal Flow Derivative Waveform with Application to Speaker Identification” [25]. ©1999, IEEE. Used by permission.

There is an alternative way of looking at this problem that leads to an equivalent repre- sentation of the vocal fold/vocal tract interaction which gives similar observed effects, but more intuition about the interaction [1]. We again simplify the problem by assuming a one-formant vocal tract load. Then, from Equation (4.44) and withptg(t ) = Psgp(t ) as the pressure across the glottis, we can write

Cdp(t )

dt + p(t ) R + 1

L t

0

p(τ )dτ = A(t )

2ptg(t )/ kρ

= A(t )

2(Psgp(t ))/ kρ

= A(t )

2Psg/ kρ

1− p(t ) Psg . Observing that from the Taylor series expansion of√

1−x we can linearize the square root function as√

1−x112x, and assumingp(t )Psg, we have

Cdp(t )

dt + p(t ) R + 1

L t

0

p(τ )dτ = A(t )

2Psg/ kρ

1− p(t ) 2Psg

.

4.5 Vocal Fold/Vocal Tract Interaction 157

Then with some algebra, we have Cdp(t )

dt + p(t ) R + 1

L t

0

p(τ )dτ + 1

2p(t )A(t )

2/ kρPsg = A(t )

2Psg/ kρ that can be rewritten as

Cdp(t )

dt + p(t ) R + 1

L t

0

p(τ )dτ + 1

2p(t )go(t ) = usc(t ) (4.45) where

usc(t ) = A(t )

2Psg go(t ) = usc(t )

Psg = A(t )

2 kρPsg.

Now differentiating Equation (4.45) with respect to time, we can show that (Exercise 4.19) Cd2p(t )

dt + 1

R + 1 2go(t )

dp(t ) dt +

1

L + 1 2g˙o(t )

p(t ) = ˙usc(t ) (4.46) which is represented by the Norton equivalent circuit of Figure 4.24 to the original circuit of Figure 4.22, where the equivalent time-varying resistance and inductance are given by

Rg(t ) = 2 go(t ) Lg(t ) = 2

˙

go(t ) (4.47)

which are in parallel, respectively, with the first-formant resistance, capacitance, and induc- tance, R, C, and L. The new time-varying volume velocity source is given by usc(t ) of Equation (4.45).

ug(t)

R Rg(t)

usc(t) Lg(t) p(t) C L

Figure 4.24 Transformed vocal fold/vocal tract first-formant interaction model that is Norton equivalent to circuit of Figure 4.22.

SOURCE: C.R. Jankowski,Fine Structure Features for Speaker Identification [10]. ©1996, C.R. Jankowski and the Massachusetts Institute of Technology.

Used by permission.

Dalam dokumen Discrete-Time Speech Signal Processing (Halaman 175-179)