• Tidak ada hasil yang ditemukan

Analysis of VTC evidence

Dalam dokumen Biswajit Dev Sarma (Halaman 107-110)

Different sounds in speech are produced by making different shapes of the vocal tract and exciting it with a voiced or unvoiced source. Low vowel sounds are produced with the mouth wide open.

Stop consonant and nasal sounds are produced by complete closure of the vocal tract. Apart from these two extreme cases, there are some sounds which are produced by making very narrow or modest constrictions in the vocal tract. Fricative sounds are produced with very narrow constriction, and semivowels, laterals and high vowels are produced with relatively moderate constriction.

As a result of constriction in the vocal tract, many changes occur in the spectrum. A number of different physical mechanisms like viscosity, heat conduction, radiation, vocal tract walls etc. can cause acoustic losses in the vocal tract resonator and each of these contributes to increasing the bandwidths of the natural frequencies of the resonator. These parameters contribute most of the bandwidths to higher formants in unconstricted sounds whereas in sounds with constrictions, the bandwidth of first formant increases significantly and that of higher formants decreases [27], [71]. A reduction in the area at any point in the vocal tract produces a drop inF1. The overall amplitude of the spectrum decreases with relatively more attenuation in the higher frequencies [27]. F1 drop and amplitude reduction is significant in complete closure and narrow constriction cases while the effect is less for moderate constrictions [27]. As a result of these effects, the low frequency component increases compared with higher frequencies. In case of voice bars, the dominant low frequency is around 180 to 200 Hz [27]. In nasal sounds too, the vocal tract is completely closed, but due to the effect of nasal tract, dominant low frequency shifts slightly towards 250 Hz [147]. For low vowels where the mouth is wide open,F1 is higher than the constricted cases and the higher formants also carry significant energies. As a result, the very low frequency component decreases as compared to constricted sounds. The VTC evidence gives a measure of the very low frequency component present in the signal and hence gives different range of values for different types of sounds.

4.3.1 Voiced sounds

The VTC evidence shows an increasing trend as the constriction increases in case of different voiced sounds. Different broad categories of voiced sounds in the decreasing order of amount of constriction are voice bars and nasals, voiced fricatives, semivowels and high vowels, liquids, and low vowels. The distribution of cosine kernel values for these broad categories is plotted in Figure 4.3. Entire TIMIT

test set is used for finding the distribution. Distribution curves from left are for low vowels ([aa], [ah], [ae]), liquids ([r], [l]), high vowels ([ih], [iy], [uh], [uw]), glides ([w], [y]), voiced fricatives ([v], [hv], [hh]), nasals ([m], [n], [ng]) and voice bars ([gcl], [dcl], [bcl]). Low frequency characteristics of the two extreme cases (low vowels and voice bars) can be seen in the plot as described before. Some low values can be seen in the voice bar case because of the silence present in the labels [gcl], [dcl]

and [bcl] considered as voice bars. Due to intervention of nasal tract, the dominant low frequency is increased (around 250 Hz) for nasals and hence the distribution is shifted towards left as compared to the voice bars even though the amount of constriction is same. Voiced fricatives are produced with a very narrow constriction by the glottal folds vibration. Due to constriction, F1 falls at the vowel-consonant boundary [27]. The ZFF signal correlates with the speech signal, but this correlation is less compared with the nasals and voice bars because of the presence of high frequency noise in the spectrum. Glides are produced with a relatively moderate constriction and F1 is slightly higher than nasals, with some energy in the higher frequencies as well resulting in a distribution similar to that of high vowels. Liquids are produced with constriction comparatively shorter than glides. The vocal tract airways cannot be approximated by a simple tube, rather the tongue is shaped in such a way that there is bifurcation of the airway. F1 is slightly higher than glides (around 400 Hz) and there is an additional resonance above F2 [27]. The evidence distribution for liquids is thus found to be less than that for the glides and high vowels.

−0.2 0 0.2 0.4 0.6 0.8 1

0 0.05 0.1 0.15 0.2

VTC evidence value

Relative Frequency

Low vowels Liquids High vowels Glides

Voiced fricatives Nasals

Voice bars

Figure 4.3: Distribution of VTC evidence for different voiced sounds.

4.3.2 Vowels

Figure 4.4 shows distribution curves for vowels with different tongue positions. From left, the distribution curves are for low vowel [aa], low-mid vowel [ae], mid vowel [eh] and high vowels [iy] and [uw]. The trend according to tongue position or vowel highness is reflected in the distribution curves.

For high vowels, F1 is less as compared to low vowels. Also there is a deeper spectral valley in the frequency range below F1 for low vowels as compared to high vowels, which have spectra with only narrow and shallow low-frequency dip in the spectrum belowF1 [27]. For mid vowels, the difference F1−F0 is intermediate between high and low vowels [148], which is reflected in the distribution curves.

−0.2 0 0.2 0.4 0.6 0.8 1

0 0.05 0.1 0.15 0.2 0.25 0.3

VTC evidence value

Relative Frequency

Low vowel [aa]

Near−low vowel [ae]

Mid vowel [eh]

High−back vowel [uw]

High−front vowel [iy]

Figure 4.4: Distribution of the VTC evidence for different vowels.

4.3.3 Unvoiced sounds

Unvoiced sounds are mainly classified into two classes, unvoiced stops and unvoiced fricatives.

Unvoiced fricatives have most of the energy in higher frequency regions with dominant resonant frequency higher than 2.5 kHz. The amount of very low frequency component is much less compared with higher frequencies. Unvoiced stops have mainly two regions, burst and aspiration. Spectral characteristics of aspiration region are same as that of unvoiced fricatives. However, the burst region has an impulse like characteristic and energy is spread over the entire frequency range. Thus, the burst region has relatively more energy in very low frequency than in the case of aspiration and frication regions. As a result, overall distribution of evidence is shifted towards right for unvoiced stops as compared to unvoiced fricatives. The distributions are shown in Figure 4.5.

−0.4 −0.2 0 0.2 0.4 0.6 0.8 1 0.05

0.1 0.15 0.2

VTC evidence value

Relative Frequency

Unvoiced fricatives Unvoiced Stops Voice bars

Figure 4.5: Distribution of the VTC evidence for unvoiced stops, unvoiced fricatives and voice bars.

4.3.4 Voiced and unvoiced sounds

The VTC evidence shows higher values for voiced consonants than corresponding unvoiced con- sonants. Distributions for voiced and unvoiced stops are shown in Figure 4.5. Two well separable distributions for voiced and unvoiced regions show the ability of evidence to discriminate sounds in terms of source information present in it.

4.4 VTC evidence as a feature for recognition of non-vowel-like

Dalam dokumen Biswajit Dev Sarma (Halaman 107-110)