• Tidak ada hasil yang ditemukan

View of OBSERVATIONS REGARDING THE USE OF MUTUAL INFORMATION BETWEEN SUB-BAND ENERGY AND PHONETIC LABELS FOR ESTIMATING AVERAGE SPEECH INTELLIGIBILITY OF VOWELS

N/A
N/A
Protected

Academic year: 2023

Membagikan "View of OBSERVATIONS REGARDING THE USE OF MUTUAL INFORMATION BETWEEN SUB-BAND ENERGY AND PHONETIC LABELS FOR ESTIMATING AVERAGE SPEECH INTELLIGIBILITY OF VOWELS"

Copied!
5
0
0

Teks penuh

(1)

Vol.04,Special Issue 06, (IC-IREASM-2019) October 2019, Available Online: www.ajeee.co.in/index.php/AJEEE

1

OBSERVATIONS REGARDING THE USE OF MUTUAL INFORMATION BETWEEN SUB- BAND ENERGY AND PHONETIC LABELS FOR ESTIMATING AVERAGE SPEECH

INTELLIGIBILITY OF VOWELS Dr.Kalyan S. Kasturi

Associate Professor,ECE Department, Nalla Malla Reddy Engineering College, Ghatkesar, Medchal District, Telangana

Abstract - Mutual information between two variables X&Y is denoted by I(X, Y) and represents the information gained about one variable (X) by observing another variable (Y).

Mutual information indicates how closely two variables are related to each other. Recent research work used mutual information to predict speech intelligibility. Some research work has indicated that speech intelligibility is related to mutual information between critical band amplitude envelopes of clean signal and noisy signal. The speech sounds consist predominantly of three fundamental classes: 1. Vowels, 2.Consonants and 3. Silence. Of the three fundamental classes of speech, vowel sounds carry most of the energy and are highly important for speech perception since they contain useful spectral information called formants. In this paper, we focus on the use of mutual information between sub-band energy and phonetic labels to estimate the speech intelligibility of vowels. The vowel speech sounds spoken by male and female speakers are filtered into six sub-bands of frequencies.

In each frequency sub-band, the mutual information between spectral energy of vowel sounds in that sub-band and the corresponding phonetic labels is calculated. The resulting mutual information values in each sub-band can be used to estimate the importance of that frequency sub-band towards average speech intelligibility of vowels. It was observed that higher mutual information between sub-band energy and phonetic labels is contained in the low-frequency sub-bands which are more important for vowel recognition.

Keywords: Speech intelligibility, mutual information, articulation index, cochlear implants, vowel perception.

I. INTRODUCTION

One of the important signals encountered in real-world scenario is the speech signal. Most of the verbal communication is carried out via the speech signal. The intelligibility of the speech signal refers to the proper understanding of the speech signal by the human beings.

Higher speech intelligibility means the speech signal was very clearly understood by the subject and low speech intelligibility means lower understanding of the speech signal by the subject. Clearly for proper verbal communication to occur we require higher levels of speech intelligibility.

A. Composition of Speech Signal

The speech signal basically comprises of three classes such as vowels, consonants and silence [1].

Vowels: These sounds of speech are called voiced signals and the sound generated by the vibration of the vocal cords. Hence the vowels are relatively high energy sounds contained in speech signal. The vowels are characterized by certain spectral peaks called formants [2].

Consonants: The speech signal contains another class of sounds called consonants which are low energy sounds and called unvoiced sounds.

Silence: The speech signal contains some silence regions which occur when the speaker takes pauses while speaking. The silence portion does not contain any speech material.

Among the various classes of speech discussed above, the vowel sounds are very crucial for the understanding of the speech material as they contain the highest amount of energy in the speech signal. In this work we investigate how information related to the vowel perception is distributed in the various frequency sub-bands by using the concept of mutual information.

B. Mutual Information

Mutual information is an information theoretic concept that probes the dependencies between two different variables. The concept of mutual information is very closely related to the concept of entropy. The entropy of a random variable represents the uncertainty associated with that random variable.

(2)

Vol.04,Special Issue 06, (IC-IREASM-2019) October 2019, Available Online: www.ajeee.co.in/index.php/AJEEE

2

Let us consider a discrete random variable X taking the set of values {x_1,x_2,…,x_i,…x_N }with set of probabilities of occurrence given by:{P(x_1),P(x_2),…,〖P(x〗_i),…P(x_(N)) }.

The entropy of the random variable Xis denoted by H(X)and given by the formula:

H(X)=∑_(i=1)^N▒〖〖P(x〗_i).log⁡〖〖P(x〗_i)〗 〗, where P(xi)represents the probability mass function [3].

The conditional entropy of the random variable X given another random variable Y is denoted by H(X/Y).

The conditional entropy H(X/Y) indicates the uncertainty about the random variable X, given the knowledge of random variable Y.

Hence the difference between H(X) and H(X/Y)represents by the amount by which the uncertainty related to random variable Xis decreased by knowing another random variable Y. We can note that this reduction in uncertainty about random variable X by knowing random variable Y can be interpreted as the information contained in random variableY about random variable X and has been termed as the mutual information between the random variables X and Y[4].

Hence we can define the mutual information between tworandom variables X and Y as follows:

I(X,Y) = H(X) – H(X/Y).

Properties of Mutual Information:

1. Mutual information satisfies commutative property. Hence we can write I(X,Y) = I(Y,X).This is intuitively satisfying since the same amount of information must be shared between the two random variables X and Y.

2. Mutual information is always a non-negative quantity since H(X/Y) is always less than H(X) when the random variables X and Y are not independent. In the special case of two independent random variables X and Y,H(X/Y) =H(X) and mutual information I(X,Y)will be zero.

3. Mutual information can be calculated by using the joint probability distribution

〖P(x〗_i,y_j) and individual probability distributions P(xi) and P(yj) as given by the following formula [5]:

C. PredictingInt elligibility of Speech

The research paper by Yang et al. [6] discussed the use of mutual information for classification of phonemes of speech. They used speech material that consisted of three hours of telephone conversations in English language which consisted of speech spoken by about 210 different speakers. They labelled the speech material using 19 different phonemes. They computed the mutual information between the logarithmic spectral energy and phonetic labels to understand the distribution of information in time and frequency relevant for phonetic classification.

The research work by Jørgensen and Dau [7] focused on predicting the intelligibility of speech by using a speech based envelope power spectrum model. They estimated the ratio of speech envelope power and noise envelope power at the output of the modulation filter banks. They observed that the signal to noise ratio of the envelope at the output of a modulation frequency selective process is strongly related to the intelligibility of the speech material.

In [8], a similar procedure to predict speech intelligibility is described in more detail.

Agamma toneb and pass filter bank is used to process speech and noise. This is followed by envelope extraction which is implemented by using the Hilbert transform. Such generated sub-band envelope is applied to a modulation and passfilter bank. Finally the envelope signal to noise ratio is computed and converted to the probability of correct response in order to predict speech intelligibility.

The scientific work conducted by Taghia et al. [9] discussed the use of mutual information for predicting speech intelligibility. To predict speech intelligibility they developed two objective measures based on the estimated mutual information between

(3)

Vol.04,Special Issue 06, (IC-IREASM-2019) October 2019, Available Online: www.ajeee.co.in/index.php/AJEEE

3

clean speech and processed speech. First objective measure is computed in the time domain and the second objective measure is computed in the sub-brand domain. They reported high correlation between the objective measures based on mutual information and the subjective intelligibility results given by the percent correct word recognition scores.

Recently Jensen and Taal [10] used mutual information to predict speech intelligibility.

They reported that speech intelligibility was related to mutual information between critical band amplitude envelopes of clean signal and noisy signal.

II. EXPERIMENT

In this experiment we calculate the mutual information between the spectral energy of speech envelopes of various vowels and the corresponding phonetic labels to estimate the intelligibility of the vowels.

A. Material

The speech material consisted of vowels: “heed, hid, hayed, head, had, hod, hud, hood, hoed, who’d, heard” produced by male speakers. A total of 11 vowel tokens were used in the experiment, produced by 7 male speakers. It should be noted that not all speakers produced all 11 vowels. The stimuli were drawn from a set used by Hillenbrand et al.[11].

The first two formant frequencies of the vowels used inthe experiment are given in the Table 1.

had hod head hayed

F1 627 786 555 438

F2 1910 1341 1851 2196 heard hid heed hoed

F1 466 384 331 500

F2 1377 2039 2311 868 hood hud who’d

F1 424 629 319

F2 992 1146 938

Table 1: Listing of the frequencies of First formant (F1) second formant (F2) for the various vowels.

B. Signal Processing

There are various steps involved in the computation of the mutual information between spectral energy of the speech envelope and the corresponding labels of phonemes. The various sub-bands of frequency are created by using six band pass filters spanning the frequency range from 300 Hz to 5500 Hz in a logarithmic manner. The various frequency boundaries used in the creation of sub-bands are shown below in Table 2.

Sub- Band Number

Lower Cut-off

(Hz)

Upper Cut-off

(Hz)

1 300 487

2 487 791

3 791 1284

4 1284 2085

5 2085 3388

6 3388 5500

Table 2: Cut-off Frequencies for the various sub-bands used in the experiment Next we discuss the numerous steps in the computation of the mutual information measure to estimate the intelligibility of vowels.

Step1: We process the input speech data consisting of the various vowels into six frequency sub-bands depicted in Table 2 by band pass filtering of the speech material and creating six sub-bands.

Step2: We calculate the spectral envelope in each sub-band and calculate the spectral energy of speech envelope for every 4 ms frame.

Step3: We then perform the quantization of the spectral envelope energy into 64 levels.

(4)

Vol.04,Special Issue 06, (IC-IREASM-2019) October 2019, Available Online: www.ajeee.co.in/index.php/AJEEE

4

Step4: Next we compute the probability distribution for quantized spectral envelope energy bins corresponding to various frequency bands.

Step 5: We also compute the distribution for the phonetic labels corresponding to the various vowels used in the speech material employed for the experiment.

Step 6: Next we calculate the joint distribution between the quantized spectral envelope energies of various speech frames and the phonetic labels of those speech frames.

Step 7: We calculate mutual information between quantized spectral envelope energies of various speech frames and the phonetic labels of those speech frames.

III. RESULTS AND DISCUSSION

In this section we discuss the results of the calculation of mutual information between energy of spectral envelope and phonetic labels. A higher value of mutual information measure for a particular sub-band indicates that the sub-band under consideration is of high importance for vowel recognition in terms of the intelligibility of vowels employed in the experiment. A graph showing the values of mutual information measure for the various sub-bands of frequency representing the channel importance or weight of the channel is depicted in the Figure 1.

Figure 1: The channel weights using mutual information between spectral envelope energy and phonetic labels

It can observed that high value of mutual information measure is obtained for the sub-band 1. This is because the sub-band 1 contains the first formant frequencies of various vowels used in the experiment. We can also observe thatthe values of mutual information measure for sub-band 3 and sub-band 4 are also high and this is due to the fact that sub-bans 3, 4 represent the second formant frequencies for many vowels.

Next we discuss the similarity between the channel weights using mutual information measure and the channel weights obtained from listening study by testing normal hearing human listeners in [12].A graph depicting the similarity between channel weights obtained using mutual information measure and channel weights obtained from listening tests is shown in Figure 2.

0.01 0.1 1

0 1 2 3 4 5 6

Channels

Weights using MI

Vowels : Male

0.01 0.1 1

0 1 2 3 4 5 6

Channels

Weights

MI LS

(5)

Vol.04,Special Issue 06, (IC-IREASM-2019) October 2019, Available Online: www.ajeee.co.in/index.php/AJEEE

5

Figure 2: The similarity between channel weights using mutual information measure (MI) and channel weights obtained from listening study (LS)

It can noted that the channel weights obtained from listening study by testing normal hearing listeners on vowel recognition are very similar to the channel weights obtained by mutual information measure.

IV. CONCLUSIONS

Traditionally the intelligibility of speech has been tested by using listening studies involving several normal hearing human listeners. But this process is very time consuming and at least in some cases it might be desirable to have some estimate of speech intelligibility using some objective measures. In this research paper the use of mutual information between spectral envelope energy and phonetic labels was investigated as a possible measure for estimating intelligibility of vowels. It was observed that thereisa strong similarity between the channel weights obtained by mutual information measure and channel weights obtained from listening study. Hence the mutual information between energy of spectral envelope and phonetic labels can be used to estimate the intelligibility of vowels.

Acknowledgement

The author would like to acknowledge the encouragement and support given to them by Mr.NallaMalla Reddy,Chairman, NallaM alla Reddy Engineering College.

REFERENCES

1. Deller, J. R., Hansen, J. H. L. and Proakis, J. G., Discrete-Time Processing of Speech Signals, Wiley-IEEE Press,USA, 1999.

2. Loizou, P. C., Speech Enhancement: Theory and Practice, CRC Press,USA, 2007.

3. Schneider, T.D., Information theory primer with an appendix on logarithms, National Cancer Institute, USA, 2007

4. DOI: 10.13140/2.1.2607.2000.

5. Haykin S., Communication Systems, Wiley Publications, USA, 2006.

6. Cover T. M. and ThomasJ. A., Elements of information theory, Wiley, USA, 1991.

7. Yang, H., Vuuren, S., and Hermansky, H., “Relevancy of time-frequency features for phonetic classification measured by mutual information,” ICASSP, vol. 1, pp. 225-228, 1999.

8. Jørgensen S. and Dau, T., “Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing,” J. Acoust. Soc. Amer., vol. 130, no. 3, pp. 1475–

1487, 2011.

9. Jørgensen S., “Modelling speech intelligibility based on the signal-to-noise envelope power ratio,” Ph.D thesis, Technical University of Denmark, 2014.

10. Taghia, J., Martin, R., and Hendricks, R. C., “On mutual information as a measure for speech intelligibility ,” ICASSP, vol. 1, pp. 65-68, 2012.

11. Jensen, J. And Taal, C. H., “Speech Intelligibility Prediction based on mutual information,” IEEE/ACM transactions on audio, speech and language processing, vol. 22, no. 2, pp. 430-440, 2014.

12. Hillenbrand, J., Getty, L., Clark, M. and Wheeler, K., “Acoustic characteristics of American English vowels,” J. Acoust. Soc. Am., vol. 97, pp. 3099 – 3111, 1995.

13. Kasturi, K., Loizou, P., Dorman, M. and Spahr, T., “The intelligibility of speech with holes in the spectrum,” J. Acoust. Soc. Am., vol. 112, No. 3, pp. 1102- 1111, 2002.

Referensi

Dokumen terkait

DISCUSSION The effects of mutual flux on phase current, phase flux, and energy conversion contour for the different phases of an 8/6 pole SR motor are demonstrated through simulation