• Tidak ada hasil yang ditemukan

Effect of Pitch-dependent Distortions on MFCCs

Dalam dokumen PDF gyan.iitg.ernet.in (Halaman 99-102)

0 100 200 0

0.5 1 1.5

Sample No.

Magnitude

100 Hz Pitch

0 2 4

−80

−60

−40

−20 0

Frequency (kHz)

Magnitude (dB)

0 1 2 3

−40

−20 0

Frequency (kHz)

Magnitude (dB)

0 100 200

0 0.5 1 1.5

Sample No.

Magnitude

200 Hz Pitch

0 2 4

−80

−60

−40

−20 0

Frequency (kHz)

Magnitude (dB)

0 1 2 3

−40

−20 0

Frequency (kHz)

Magnitude (dB)

0 100 200

0 0.5 1 1.5

Sample No.

Magnitude

300 Hz Pitch

0 2 4

−80

−60

−40

−20 0

Frequency (kHz)

Magnitude (dB)

0 1 2 3

−40

−20 0

Frequency (kHz)

Magnitude (dB)

(b) (a)

(c)

Figure 4.6: Plots showing 21-point Mel spectra (right panel) of the synthetically generated pitch harmonic spectra (middle panel) corresponding to different pitch frequencies (a) 100 Hz (b) 200 Hz (c) 300 Hz. The synthetic pitch harmonic spectra are created by taking linear DFT of impulse trains shown in corresponding left panel. Note that the slope in the Mel spectra is on account of the outputs of the Mel filters not being normalized by their corresponding areas.

0 1 2 3

−40

−20 0

Frequency (kHz)

Magnitude (dB)

100 Hz Pitch 300 Hz Pitch

0 5 10

−6

−4

−2 0 2

Coefficient Index

Magnitude 100 Hz Pitch

300 Hz Pitch

0 5 10

−5 0 5

Coefficient Index

Relative Change 300 Hz Pitch

0 1 2 3

−40

−20 0

Frequency (kHz)

Magnitude (dB)

100 Hz Pitch 300 Hz Pitch

0 5 10

−6

−4

−2 0 2

Coefficient Index

Magnitude 100 Hz Pitch

300 Hz Pitch

0 5 10

−10 0 10

Coefficient Index

Relative Change

300 Hz Pitch

Vowel /AE/ Vowel /IY/

(c) (b) (a)

Figure 4.7: Plots for vowels /AE/ and /IY/ having pitch values of around 100 Hz and 300 Hz (a) Smoothed Mel spectra (b) 13-dimensional truncated MFCCs excludingC0(c) relative change in each MFCC for the 300 Hz pitch signal with respect to those for the 100 Hz pitch signal.

signals with those of the low pitch signals for same vowels. The relative changes in MFCCs for the high pitch signals with respect to those for the low pitch signals are shown in the bottom panel in Figure 4.7. It is noted that the relative change in MFCCs (C1−C12) of high pitch signals is more for higher order MFCCs in comparison to that in case of the lower order MFCCs in case of both vowels. From physiological point of view, the higher pitch speech signals correspond to shorter vocal tract lengths. So, the smoothed Mel spectra for the low and high pitch signals may appear to contain some differences due to differences in the vocal tracts in the two cases. Therefore, the changes in the MFCCs of high pitch signals with respect to those of low pitch signals can not be attributed only to the argued pitch-dependent distortions appearing in the smoothed Mel spectral envelope in this case.

Thus, to study the effect of only the pitch-dependent distortions on MFCCs (C0 −C12), the 21- dimensional MFCC features are computed by taking DCT of the 21-point Mel spectra of the synthetic

0 2 4

−80

−60

−40

−20 0

Frequency (kHz)

Magnitude (dB) 100 Hz Pitch

0 1 2 3

−40

−20 0

Frequency (kHz)

Magnitude (dB)

0 10 20

−100

−50 0

Coefficient Index

Magnitude

0 2 4

−80

−60

−40

−20 0

Frequency (kHz)

Magnitude (dB) 200 Hz Pitch

0 1 2 3

−40

−30

−20

−10 0

Frequency (kHz)

Magnitude (dB)

0 10 20

−100

−50 0

Coefficient Index

Magnitude

0 2 4

−80

−60

−40

−20 0

Frequency (kHz)

Magnitude (dB) 300 Hz Pitch

0 1 2 3

−40

−30

−20

−10 0

Frequency (kHz)

Magnitude (dB)

0 10 20

−100

−50 0

Coefficient Index

Magnitude

(b) (a)

(c)

Figure 4.8: Plots of the 128-point linear DFT spectra (left panel), 21-point Mel spectra (middle panel) and their corresponding MFCCs excludingC0 (right panel) for the synthetically generated pitch harmonic spectra having pitch frequency of around (a) 100 Hz (b) 200 Hz (c) 300 Hz.

pitch harmonic spectra corresponding to different pitch frequencies shown in Figure 4.6. The 21 MFCCs of the synthetic pitch harmonic spectra corresponding to pitch frequencies of around 100 Hz, 200 Hz and 300 Hz along with their corresponding linear DFT spectra and their Mel spectra are shown in Figure 4.8. It is noted that as the pitch frequency is increasing, the pitch-dependent distortions are increasing in the Mel spectra and correspondingly the dynamic range of all coefficients is also increasing in their Mel cepstra.

Further, the relative change in each of the coefficients of MFCC features of the synthetic pitch harmonic spectra corresponding to pitch frequency of 200 Hz and 300 Hz with respect to those of the pitch harmonic spectra corresponding to pitch frequency of 100 Hz are shown in Figure 4.9. It is noted that though due to variation in the pitch frequency all MFCCs are being affected, the relative change is observed more in the higher order coefficients of MFCC features than the lower order MFCCs. As the pitch frequency is increasing the dynamic range of the higher order MFCCs is increasing comparatively more than that for the lower order MFCCs. It is interesting to note that these observations are also

0 5 10 15 20

−100

−50 0 50 100

Coefficient Index

Magnitude

200 Hz Pitch

0 5 10 15 20

−100

−50 0 50 100

Coefficient Index

Magnitude

300 Hz Pitch

(a) (b)

Figure 4.9: Plots showing the relative change in each MFCC (C1C20) for the synthetically generated pitch harmonic spectra of different pitch frequencies with respect to those for the synthetic pitch harmonic spectrum having pitch frequency of around 100 Hz (a) 200 Hz (b) 300 Hz.

made in context of the 13-D MFCC features (C0−C12) which have becomede facto standard features for all ASR systems. This observation is also consistent with the earlier observed greater relative change in the higher order coefficients than the lower order coefficients of 13-D truncated MFCCs (C0−C12) of vowel /IY/ having pitch value of around 300 Hz in comparison to MFCCs for vowel /IY/ having pitch value of around 100 Hz extracted from real signals.

Thus, the pitch-dependent distortions in the Mel spectral envelope of high pitch real signals affect all MFCCs but relatively increase the magnitude of the higher order coefficients of 13-D MFCC features to a larger extent in comparison to the lower order coefficients as noted in Figure 4.7. This is attributed for the increase in the variances of the higher order coefficients of 13-D MFCC (C0−C12) with increase in the pitch of the signals as noted in Section 3.2.1.

It is already known that children’s speech have significantly higher pitch values in comparison to those of adults’ speech. Children of age from 6-11 years have been reported to have pitch values in the range from 250-350 Hz [27]. Thus, the observations made in the above studies for the 300 Hz pitch signals (both real and synthetic) would be valid for children’s speech.

Dalam dokumen PDF gyan.iitg.ernet.in (Halaman 99-102)