Organization of the Work

Chapter 3 demonstrates the strength of Multiple Frame Size and Rate (MFSR) analysis for speaker recognition under limited data condition. First, studies are made using Multiple Frame Size (MFR) and Multiple Frame Rate (MFR). Then, the experimental results are compared with Single Frame Size and Rate (SFSR) analysis. Finally, to gain the advantages of MFS and MFR, we combine them and call it as MFSR analysis. The experimental results of MFSR is also compared with MFS, MFR and SFSR.

The combination of features for speaker recognition under limited data condition is de- scribed in Chapter 4. First, the working principle of different feature extraction techniques which provide features like MFCC, Delta (∆) MFCC, Delta-Delta (∆∆) MFCC, Linear Pre- diction Residual (LPR) and Linear Prediction Residual Phase (LPRP) are studied. Then the effectiveness of each feature extraction technique is experimentally explored independently to know the level of speaker information present in them. Finally, all the features are combined to obtain better representation of speaker. The experimental results of combined features are compared with the individual features.

Chapter 5 proposes the combined modelling techniques for speaker recognition under lim-

2.7 Organization of the Work

ited data condition. First, the pattern classification principle involved in different modelling techniques like VQ, FVQ, SOM, LVQ, GMM and GMM-UBM are studied. Then, the performance of each modelling technique is verified through experimental studies. Finally, based on the performance of the individual classifiers we have combined them at the scoring level to see the effectiveness under limited data condition. The experimental results of different combined modelling techniques are compared with the individual modelling techniques.

The techniques we propose for analysis, feature extraction and modelling stages are demon- strated independently. That is, proposed technique used in the respective stage and keeping the existing techniques in the remaining stages. In Chapter 6, proposed techniques are integrated to see the effectiveness of integrated system. The experimental results of the integrated system are compared with the proposed individual systems.

Integrating the techniques lead to different individual integrated systems. In chapter 7, evi- dences from the different integrated systems are combined using different combination schemes at abstract, rank and measurement levels. The different combination schemes results are compared with the proposed integrated and individual systems.

In chapter 8, summary of the present work is discussed first. Then, the major contributions of the work in developing some approaches for speaker recognition under limited data condition are mentioned. Finally, some possible future directions are mentioned.

TH-797_05610204

2. Speaker Recognition - A review

3

MFSR Analysis of Speech for Limited Data Speaker Recognition

3.1 Introduction . . . 36 3.2 MFSR Analysis of Speech . . . 37 3.3 Limited Data Speaker Recognition using MFSR Analysis . . . 47 3.4 Experimental Results and Discussions . . . 50 3.5 Summary . . . 64

TH-797_05610204

3. MFSR Analysis of Speech for Limited Data Speaker Recognition

State-of-the-art speaker recognition systems assume the availability of sufficient data for modelling and testing. Due to this, speech signals are analyzed with fixed frame size and rate which may be termed as Single Frame Size and Rate (SFSR) analysis. In the limited data condition available training and testing data is small. If we use SFSR analysis, then it may not provide sufficient feature vectors to train and test the speaker. Further, insufficient feature vectors lead to poor speaker modelling during training and may not yield reliable decision during testing. In this chapter, as part of analysis, we demonstrate the use of Multiple Frame Size (MFS), Multiple Frame Rate (MFR) and Multiple Frame Size and Rate (MFSR) analysis techniques for speaker recognition under limited data condition. These techniques produce relatively more number of feature vectors. This helps in better modelling and testing under limited data condition. The experimental results show that use of MFS, MFR and MFSR analysis improves the performance significantly compared to SFSR analysis.

3.1 Introduction

This chapter focuses on exploring alternate speech analysis techniques to extract vocal tract information from the speech signal. In the existing speaker recognition systems the analysis stage uses frame size and shift in the range of 10-30 ms. If the data available is only few seconds, then 10-30 ms choice provides only few feature vectors This will lead to poor speaker modelling and also may not reliably test the speakers [110]. One approach to mitigate this problem is to artificially increase the number of feature vectors. In the existing speaker recognition systems once the frame size and shift are chosen, they are kept constant throughout the experiment, and hence it may be termed as Single Frame Size and Rate (SFSR) analysis.

In this work, the same speech is analyzed using different frame size and rate and hence it is termed as Multiple Frame Size and Rate (MFSR) analysis. The motivation behind varying the frame size is to perform a multi-resolution analysis of the same speech data. It is observed that the feature vectors representing the vocal tract information extracted from the same speech signal by multi-resolution analysis are considerably different [110, 111]. Further, the speaking rate as well as pitch are different for different speakers and also for the same speaker depending

Dalam dokumen Limited Data Speaker Recognition (Halaman 66-71)

3

MFSR Analysis of Speech for Limited Data Speaker Recognition

Contents

3.1 Introduction