EXERCISES
4.4 A Discrete-Time Model Based on Tube Concatenation
4.4.2 A Discrete-Time Realization
4.4 A Discrete-Time Model Based on Tube Concatenation 143
occurs at the two boundary conditions. Nevertheless, it is possible to select loss at the glottis and lips to give observed formant bandwidths. Furthermore, with this configuration Fant [7]
and Flanagan [8] have shown that with appropriate choice of section lengths and cross-sectional areas, realistic formant frequencies can be obtained for vowels. As we noted earlier, however, the resulting model is not necessarily consistent with the underlying physics. In the following section, we describe a means of converting the analog concatenated tube model to discrete time.
In this discrete-time realization, as in the analog case, all loss, and hence control of formant bandwidths, is introduced at the two boundaries.
which can be shown to be periodic with period 22πτ (Figure 4.17a):
Va(+ 2π
2τ) = ∞
k=0
bke−j (+2π2τ)k2τ
=
∞
k=0
bke−j k2τe−j (2π2τ)k2τ
= Va().
The intuition for this periodicity is that we have “discretized” the continuous-space tube with space-intervalx = Nl , and the corresponding time-intervalτ = xc , so we expect periodicity to appear in the transfer function representation.
From Figure 4.17a, we see thatVa()has the form of a Fourier transform of a sampled continuous waveform with sampling time intervalT = 2τ. We use this observation to trans- form the analog filtering operation to discrete-time form with the following steps illustrated in Figure 4.17:
S1:Using the impulse-invariance method, i.e., replacingesT with the complex variablezwhereT =2τ, we transform the system functionVa()to discrete-time:
Va(s) = ∞
k=0
bk(es2τ)−k
which, with the replacement ofesT by the complex variablez, becomes
V (z) = ∞
k=0
bkz−k.
The frequency responseV (ω)=V (z)|z=ej ωwill be designed to match desired formant resonances over the interval [−π, π].
S2: Consider an excitation functionug(t )that is bandlimited with maximum frequencymax = 2τπ and sampled with a periodic impulse train with sampling intervalT = 2τ, thus meeting the Nyquist criterion to avoid aliasing. The Fourier transform of the resulting excitation is denoted byUg(). We then convert the impulse-sampled continuous-time input to discrete time. This operation is illustrated in the frequency-domain in Figure 4.17b.
S3:A consequence of using the impulse invariance method to perform filter conversion is a straightforward conversion of a continuous-time flow graph representation of the model to a discrete-time version. An example of this transformation is shown in Figure 4.18 for the two-tube case. Since the mapping ofesT tozyieldsV (z), the discrete-time signal flow graph can be obtained in a similar way. A delay ofτ seconds corresponds to the continuous-time factore−sτ =e−s2τ /2 =e−sT /2, which in discrete-time is a half-sample delay. Therefore, in a signal flow graph we can replace the delayτ byz−1/2. Since a half-sample delay is difficult to implement (requiring interpolation), we move all lower-branch delays to
4.4 A Discrete-Time Model Based on Tube Concatenation 145
(a)
(b) 2π
–2π –π π 2π ω
2τ
–2π 2τ
2π 2π
2π
Ω
ω 2τ
π
π 2τ π
–π 2τ
2π Ω
2τ π 2τ π
2τ
Va(Ω)
V(ω)
Ug(Ω)
Ug(ω)
– –
– –
Figure 4.17 Frequency-domain view of discretizing analog filtering by concatenated tubes with equal length: (a) conversion of impulse-sampled continuous-time impulse response of concatenated tubes to discrete time; (b) conversion of impulse-sampled continuous-time glottal input to discrete time.
the upper branch, observing that delay is preserved in any closed branch with this change, as illustrated in Figure 4.18. The resulting delay offset can be compensated at the output [28].
S4:The final step in the conversion of continuous-time filtering to discrete-time is to multiply the discrete- time frequency responses of the excitation and the vocal tract impulse response to form the frequency response of the discrete-time speech output.
We now step back and view the discretization process from a different perspective. The original continuously spatial-varying vocal tract, assuming that it can be modeled as linear and time-invariant, has an impulse response that we denote by v˜a(t ). By discretizing the vocal tract spatially, we have in effect sampled the impulse response temporally with sampling interval T = 2τ. The constraint x = cτ indicates that spatial and time sampling are
1 + rg 2 ug(t)
ug(nT)
ug(nT)
uL(t)
uL(nT)
uL(nT) rg
rg
z–1/2 z–1/2
z–1 z–1/2
z–1
z–1/2
–r1 –rL
–rL r1
–r1 r1
rg –r1 r1 –rL
(1 + r1)
(1 – r1)
(1 + r1)
(1 + r1) (1 – r1)
(1 – r1) (a)
(b)
(c)
(1 + rL)
(1 + rL)
(1 + rL) z 1 + rg
1 + rg 2
τ
τ
τ
τ
2
Figure 4.18 Signal flow graph conversion to discrete time: (a) lossless two-tube model; (b) discrete-time version of (a); (c) conversion of (b) with single-sample delays.
SOURCE: L.R. Rabiner and R.W. Schafer, Digital Processing of Speech Signals[28].
©1978, Pearson Education, Inc. Used by permission.
intimately connected. The resulting frequency response is generally analiasedversion of the (true) frequency responseV˜a()because the sampling may not meet the Nyquist criterion.
The smaller the sub-tube length (increasing the number of tubes), then the less aliasing and the closer is the discrete-time frequency response to the (true) analogv˜a(t ). We will see shortly that increasing the number of tubes implies that we increase the number of poles in the discrete model. The following example illustrates an interesting case:
EXAMPLE4.4 Consider a uniform tube, itself modeled as a concatenation of uniform tubes, but with only a single tube element. The frequency responseV˜a()has an infinite number of resonances, as shown in Example 4.2. (Note thatVa()had previously denoted the true frequency response.) How can we possibly capture all of these resonances in discrete time? If we sample withT =2τ = 2lc, then we have only “half a resonance” over the discrete-time frequency range [0, π]; therefore, we need to divide the uniform tube into two equal tubes to allow a full resonance over [0, π], i.e., we need to
4.4 A Discrete-Time Model Based on Tube Concatenation 147
sample spatially twice as fast. This is an intriguing example which gives insight into the relation of the continuous- and discrete-time representations of resonant tubes. (The reader is asked to further
elaborate on this example in Exercise 4.18.) 䉱
Our next objective is to derive a general expression forV (z)in terms of the reflection coefficients.
From the discrete-time flow graph in Figure 4.18, we know that V (z) = UL(z)
Ug(z) = f (rg, r1, r2, . . . rL).
To obtain the functionf (rg, r1, r2, . . . rL)from the flow graph directly is quite cumbersome.
The flow graph, however, gives a modular structure by which the transfer function can be computed [28]. The resulting transfer function can be shown to be a stable all-pole function with bandwidths determined solely by the loss due toZg()andZr(), i.e.,V (z)is of the form
V (z) = Az−N /2 D(z) where
D(z) = 1 −
N
k=1
akz−k
and where the poles correspond to the formants of the vocal tract. Moreover, under the condition rg = 1, i.e., infinite glottal impedance (Zg()= ∞) so that no loss occurs at the glottis, a recursion from the lips to the glottis can be derived whereby the transfer function associated with each tube junction is introduced recursively [28]. The recursion for the denominator polynomial D(z)is given by
D0(z) = 1
Dk(z) = Dk−1(z)+rkz−kDk−1(z−1), k = 1,2, . . . N D(z) = DN(z).
Because the vocal tract tube cross-sections (from which reflection coefficients are derived) Ak > 0, we can show that the poles are all inside the unit circle, i.e., the resulting system function is stable.17As before, we can imagine the(N +1)st tube is infinite in length with AN+1selected so thatrN =rL. WhenZg()= ∞and whenZr =0 so thatrN =rL=1, then there is a short circuit at the lips. Under this condition, there is no loss anywhere in the system and hence zero bandwidths arise; withZg()= ∞, the radiation impedanceZr() is the only source of loss in the system and controls the resonance bandwidths. The following example illustrates how to choose the number of tube elements to meet a desired bandwidth constraint.
17WhenZg()= ∞the recursion is associated withLevinson’s recursion, which will be derived in the context of linear prediction analysis of Chapter 5. Using Levinson’s recursion, we will prove that the poles of D(z)lie inside the unit circle.
EXAMPLE4.5 Let the vocal tract lengthl=17.5 cm and the speed of soundc=350 m/s.
We want to find the number of tube sectionsN required to cover a bandwidth of 5000 Hz, i.e., the excitation bandwidth and the vocal tract bandwidth are 5000 Hz. Recall thatτ = cNl and that 2π4τ is the cutoff bandwidth. Therefore, we want 4τ1 =5000 Hz. Solving forτ, the delay across a single tube,τ =200001 . Thus, from above we haveN= cτl =10. SinceNis also the order of the all-pole denominator, we can model up to N2 = 5 complex conjugate poles. We can also think of this as
modeling one resonance per 1000 Hz. 䉱
We see that the all-pole transfer function is a function of only the reflection coefficients of the original concatenated tube model, and that the reflection coefficients are a function of the cross- sectional area functions of each tube, i.e.,rk = AAk+1k+1−+AAkk. Therefore, if we could estimate the area functions, we could then obtain the all-pole discrete-time transfer function. An example of this transition from the cross-sectional areasAktoV (z)is given in the following example:
EXAMPLE4.6 This example compares the concatenated tube method with the Portnoff numer- ical solution using coupled partial differential equations [26]. Because an infinite glottal impedance is assumed, the only loss in the system is at the lips via the radiation impedance. This can be introduced, as we saw above, with an infinitely-long(N+1)th tube, depicted in Figure 4.19 with a terminating cross-sectional area selected to match the radiation impedance, according to Equation (4.35), so that rN =rL. By altering this last reflection coefficient, we can change the energy loss in the system and thus control the bandwidths. For example, we see in Figure 4.19 the two different cases ofrN =0.714 (non-zero bandwidths) andrL =1.0 (zero bandwidths). This example summarizes in effect all we have seen up to now by comparing two discrete-time realizations of the vocal tract transfer function that have similar frequency responses: (1) A numerical simulation, derived with central difference approximations to partial derivatives in time and space, and (2) A (spatially) discretized concatenated
tube model that maps to discretized time. 䉱