• Tidak ada hasil yang ditemukan

Non-Uniform Filterbank based Spectral Analysis

Dalam dokumen PDF gyan.iitg.ernet.in (Halaman 94-99)

4.2 Effect of Uniform and Non-Uniform Filterbank on Pitch Harmonicity

4.2.2 Non-Uniform Filterbank based Spectral Analysis

0 1 2 3

−120

−100

−80

−60

−40

−20 0

Frequency (kHz)

Magnitude (dB)

21−point Mel Spectrum

0 5 10 15 20

−6

−4

−2 0 2

Coefficient Index

Magnitude

0 1 2 3

−100

−50 0

Frequency (kHz)

Magnitude (dB)

21−point Mel Spectrum

0 5 10 15 20

−6

−4

−2 0 2

Coefficient Index

Magnitude

(a)

(b)

Figure 4.4: Plots of the 21-point Mel spectra (left panel) and their corresponding cepstra (right panel) for vowel /IY/ having pitch values of around (a) 100 Hz (b) 300 Hz.

from the 21-channel Mel filterbank are used for parameterizing a speech frame of 8 kHz sampling rate.

Truncation of cepstral features from 21 to 13 dimensions would result in additional smoothing of the 21-point Mel spectrum. Therefore, we further verify that whether the pitch-dependent distortions observed in the Mel spectral envelope of the high pitch signal appear in the smoothed Mel spectral envelope corresponding to the truncated 13-D MFCC features.

The smoothed Mel spectrum corresponding to the truncated 13-D Mel cepstrum is obtained by computing an inverse discrete cosine transform (IDCT) of the 13-D Mel cepstrum (C0−C12) after appending zeros to the cepstrum. For better exposition of the details in the smoothed Mel spectrum, 115 zeros are appended to the 13-D Mel cepstrum to obtain a 128-point smoothed Mel spectrum (referred to as ’Smoothed’). Figure 4.5 shows the plots of the signals and the 128-point smoothed Mel spectra along with their corresponding linear DFT spectra for central steady-state portions of vowel /IY/ having pitch values of around 100 Hz, 220 Hz and 300 Hz. It is to note that significant pitch-dependent distortions appear in the smoothed Mel spectral envelope particularly at the lower frequencies (below 1 kHz) for the 300 Hz pitch signal which are similar to those observed in case of the 21-point Mel spectrum obtained as the output of the Mel filterbank for the high pitch signal in

0 50 100 150

−0.2 0 0.2

Sample No.

Magnitude 100Hz Pitch

0 1 2 3

−60

−40

−20 0

Frequency (kHz)

Magnitude (dB)

Linear DFT Smoothed

0 1 2 3

−60

−40

−20 0

Frequency (kHz)

Magnitude (dB)

Linear DFT Smoothed

0 50 100 150

−0.2 0 0.2

Sample No.

Magnitude

220Hz Pitch

0 50 100 150

−0.2 0 0.2

Sample No.

Magnitude

300Hz Pitch

0 1 2 3

−60

−40

−20 0

Frequency (kHz)

Magnitude (dB)

Linear DFT Smoothed

Speech Signal Spectra

(c) (b) (a)

Figure 4.5: Plots of the signals and the 128-point smoothed Mel spectra (referred to as ’Smoothed’) along with their corresponding linear DFT spectra for vowel /IY/ having pitch values of around (a) 100 Hz (b) 220 Hz (c) 300 Hz.

Figure 4.4. On the other hand, no such pitch-dependent distortions are noted in the smoothed Mel spectral envelope of the 100 Hz signal which rather appears to be sufficiently smoothed out similar to the 21-point Mel spectrum of the low pitch signal noted in Figure 4.4. Further, on comparing the 128-point smoothed Mel spectra corresponding to signals having pitch values of around 100 Hz, 220 Hz and 300 Hz, it is also noted that the extent of the pitch-dependent distortions along the frequency range and their magnitude are increasing with increasing pitch of the signals.

In a Mel filterbank, the bandwidths of the filters are chosen to approximate the critical bandwidth phenomena observed in the psychoacoustic studies for human auditory perception [2]. The comparison of the center frequencies and the critical bandwidths for human auditory perception as proposed by Zwicker [2] with those of the 21-channel Mel filterbank as per HTK implementation for 4 kHz signal bandwidth are given in Table 4.1. The non-uniform filters in the Mel filterbank are intended to smooth out the pitch harmonics in the linear DFT spectrum to capture the spectral envelope characterizing the vocal filter in the resulting Mel spectrum. In case of low pitch signals, the Mel filterbank appears

Table 4.1: The center frequencies and the critical bandwidths for human auditory perception as proposed by Zwicker [2] and the center frequencies along with the corresponding bandwidths of all filters of a 21-channel Mel filterbank as per the HTK implementation for 4 kHz signal bandwidth.

Critical Bandwidths 21-channel Mel filterbank Filter Proposed by Zwicker as per HTK Implementation

No. Center Frequency Bandwidth Center Frequency Bandwidth

1 50 100 63.3 132

2 150 100 132.3 144

3 250 100 207.6 157

4 350 100 289.6 172

5 450 110 379.1 187

6 570 120 476.6 204

7 700 140 583.0 222

8 840 150 699.0 242

9 1000 160 825.5 264

10 1170 190 963.4 288

11 1370 210 1113.8 314

12 1600 240 1277.8 343

13 1850 280 1456.7 374

14 2150 320 1651.6 408

15 2500 380 1864.3 444

16 2900 450 2096.1 485

17 3400 550 2348.9 528

18 4000 700 2624.6 576

19 4080 900 2925.1 628

20 5800 1100 3252.9 685

21 7000 1300 3610.3 747

22 8500 1800 - -

23 10500 2500 - -

24 13500 3500 - -

to effectively smooth out the pitch harmonics as the bandwidth of the narrowest filter in the Mel filterbank is also comparable to their pitch harmonic frequency. This can be further understood by noting the center frequencies and the bandwidths of the filters of the 21-channel Mel filterbank for 4 kHz signal bandwidth given in Table 4.1. As a result, the Mel spectrum and the Mel cepstrum of

the low pitch signal are also noted to contain no significant pitch harmonicity. However, in case of high pitch signal, due to greater separation between the pitch harmonics the smoothing by the typical non-uniform Mel filterbank is not as effective due to the bandwidths of some filters being lesser than the pitch of the signal. This is attributed to cause the undesired pitch-dependent distortions in the Mel spectral envelope in case of high pitch signals. The bandwidths of the filters in the Mel filterbank increase with increasing center frequencies of the filters. This results in increase in the smoothing of the pitch harmonics along the frequency in the spectrum. As a result, the pitch-dependent distortions appear predominantly at low frequencies in the smoothed Mel spectral envelope of the high pitch signals as shown in Figure 4.5.

The occurrence of the pitch-dependent distortions in the Mel spectral envelope for high pitch signals due to insufficient smoothing of the pitch harmonics by the non-uniform Mel filterbank is also validated using a synthetic example. The Mel spectra at the output of the Mel filterbank are computed for synthetically generated pitch harmonic spectra corresponding to pitch values of 100 Hz, 200 Hz and 300 Hz without the effect of the vocal filter and are shown in Figure 4.6. The synthetic pitch harmonic spectra are created by taking linear DFT of impulse trains of different pitch periods after windowing them using Hanning window of 200 points (corresponding to 25 msec speech frame). It is to note that pitch-dependent distortions appear in the Mel spectral envelope for pitch harmonic spectra of higher frequency which increase with increase in the frequency of the pitch harmonic spectra. This verifies that the pitch-dependent distortions appear in the Mel spectral envelope for high pitch signals only due to the insufficient smoothing of the pitch harmonics by the Mel filterbank.

Despite the appearance of some pitch-dependent distortions in the low frequency region in the Mel spectral envelope of high pitch real signal, the corresponding cepstrum does not appear to contain any pitch-dependent harmonicity. This is attributed to the constant-Q type filters in the Mel filterbank.

The bandwidths of the Mel filters increase with increasing center frequencies of the filters. As a result, the regularity of the pitch harmonics in the resulting spectrum is disturbed across the entire frequency range. So, in the Mel filtered cepstra the pitch-related information does not get separated from the effects of the vocal filter. Following these, we argue that the pitch-dependent distortions occurring in the Mel spectral envelope would not show any pitch-dependent harmonicity in the Mel cepstrum but would affect all cepstral coefficients. Therefore, the pitch-related information, if captured, can not be extracted out completely from the Mel cepstrum by cepstral truncation.

0 100 200 0

0.5 1 1.5

Sample No.

Magnitude

100 Hz Pitch

0 2 4

−80

−60

−40

−20 0

Frequency (kHz)

Magnitude (dB)

0 1 2 3

−40

−20 0

Frequency (kHz)

Magnitude (dB)

0 100 200

0 0.5 1 1.5

Sample No.

Magnitude

200 Hz Pitch

0 2 4

−80

−60

−40

−20 0

Frequency (kHz)

Magnitude (dB)

0 1 2 3

−40

−20 0

Frequency (kHz)

Magnitude (dB)

0 100 200

0 0.5 1 1.5

Sample No.

Magnitude

300 Hz Pitch

0 2 4

−80

−60

−40

−20 0

Frequency (kHz)

Magnitude (dB)

0 1 2 3

−40

−20 0

Frequency (kHz)

Magnitude (dB)

(b) (a)

(c)

Figure 4.6: Plots showing 21-point Mel spectra (right panel) of the synthetically generated pitch harmonic spectra (middle panel) corresponding to different pitch frequencies (a) 100 Hz (b) 200 Hz (c) 300 Hz. The synthetic pitch harmonic spectra are created by taking linear DFT of impulse trains shown in corresponding left panel. Note that the slope in the Mel spectra is on account of the outputs of the Mel filters not being normalized by their corresponding areas.

Dalam dokumen PDF gyan.iitg.ernet.in (Halaman 94-99)