3.4 Experimental Results and Discussions
3.4.3 Limited Training and Test Data
3. MFSR Analysis of Speech for Limited Data Speaker Recognition
5 10 15 20 25
40 50 60 70 80 90 100
Amount of testing data in sec
performnce (%)
5 10 15 20 25
40 50 60 70 80 90 100
Amount of testing data in sec
performnce (%)
5 10 15 20 25
40 50 60 70 80 90 100
Amount of testing data in sec
performnce (%)
5 10 15 20 25
40 50 60 70 80 90 100
Amount of testing data in sec
performnce (%)
SFSR MFS MFR MFSR SFSR
MFS MFR MFSR
SFSR MFS MFR MFSR SFSR
MFS MFR MFSR
N=16 N=32
N=64 N=128
Figure 3.7: Performance of the speaker recognition system based on SFSR and MFSR analysis techniques for different sizes of testing data for the first 30 speakers taken from the YOHO database.
The amount of training data is 25 sec in all cases.
3.4 Experimental Results and Discussions
5 10 15 20 25
20 30 40 50 60 70 80 90 100
Amount of testing data in sec
performnce (%)
SFSR MFS MFR MFSR N=128
Figure 3.8: Performance of the speaker recognition system based on SFSR and MFSR analysis techniques for 138 speakers taken from the YOHO database for different sizes of testing data. The amount of training data is 25 sec.
• MFS trained models with SFSR and MFSR methods of testing
• MFR trained models with SFSR and MFSR methods of testing
• MFSR trained models with SFSR and MFSR methods of testing
In the first experiment, speaker models are trained using SFSR based analysis. The trained models are individually tested using SFSR and MFSR methods. The experimental results for 30 speakers are shown in Figure 3.9. From the figure it is observed that though the models are poorly trained using SFSR analysis, during testing MFSR analysis significantly improves the performance. The recognition performance of 83% is achieved for 3 sec data using MFSR for codebook of size 128. The performance is higher than the performance of SFSR that provides 70% for codebook of size 64. The comparative differences in performance from SFSR to MFSR for other data sizes of 6 and 12 sec can also be seen in the figure. The figure also shows that the performances of all analysis methods approach nearer at 24 sec of data.
TH-797_05610204
3. MFSR Analysis of Speech for Limited Data Speaker Recognition
5 10 15 20 25
40 50 60 70 80 90 100
Amount of testing data in sec
performnce (%)
5 10 15 20 25
40 50 60 70 80 90 100
Amount of testing data in sec
performnce (%)
5 10 15 20 25
40 50 60 70 80 90 100
Amount of testing data in sec
performnce (%)
5 10 15 20 25
40 50 60 70 80 90 100
Amount of testing data in sec
performnce (%)
SFSR MFS MFR MFSR SFSR
MFS MFR MFSR
SFSR MFS MFR MFSR SFSR
MFS MFR MFSR
N=16 N=32
N=64 N=128
Figure 3.9: Performance of the speaker recognition system based on SFSR and MFSR analysis techniques for different sizes of testing data for the first 30 speakers taken from the YOHO database.
SFSR trained model is used for testing.
3.4 Experimental Results and Discussions
The speaker models are trained using MFS based analysis technique in the second experi- ment. The trained models are tested using SFSR and MFSR method of analysis and the results are shown in Figure 3.10. It shows that the recognition performance of 83% is achieved for 3 sec data using MFSR for codebook of size 128. The performance is higher than the performance of SFSR that provides 67% for codebook of size 64. These results are almost same as that of the previous experiment. Further, we can infer that when both training and testing data are limited the MFS method alone does not improve the performance. For other data sizes of 6 and 12 sec also the performance is almost same as that of the previous experiment.
5 10 15 20 25
40 50 60 70 80 90 100
Amount of testing data in sec
performnce (%)
5 10 15 20 25
40 50 60 70 80 90 100
Amount of testing data in sec
performnce (%)
5 10 15 20 25
40 50 60 70 80 90 100
Amount of testing data in sec
performnce (%)
5 10 15 20 25
40 50 60 70 80 90 100
Amount of testing data in sec
performnce (%)
SFSR MFS MFR MFSR SFSR
MFS MFR MFSR
SFSR MFS MFR MFSR SFSR
MFS MFR MFSR
N=16 N=32
N=64 N=128
Figure 3.10: Performance of the speaker recognition system based on SFSR and MFSR analysis techniques for different sizes of testing data for 30 speakers taken from the YOHO database. MFS trained model is used for testing.
TH-797_05610204
3. MFSR Analysis of Speech for Limited Data Speaker Recognition
In the third experiment, speaker models are trained using MFR based analysis technique.
The trained models are tested using SFSR and MFSR analysis methods. The experimental results for 30 speakers are shown in Figure 3.11. It shows that the recognition performance of 87% is achieved for 3 sec data for codebook of size 128. The performance is higher than the performance of SFSR that provides 83% for codebook of size 128 and the performance of previous experiment (83%). Further, it can be observed that MFR indeed improves the performance when training and testing data are limited. For other data sizes of 6 and 12 sec also we can see about 7% improvement in MFSR performance over SFSR in the figure.
5 10 15 20 25
40 50 60 70 80 90 100
Amount of testing data in sec
performnce (%)
5 10 15 20 25
40 50 60 70 80 90 100
Amount of testing data in sec
performnce (%)
5 10 15 20 25
40 50 60 70 80 90 100
Amount of testing data in sec
performnce (%)
5 10 15 20 25
40 50 60 70 80 90 100
Amount of testing data in sec
performnce (%)
SFSR MFS MFR MFSR SFSR
MFS MFR MFSR
SFSR MFS MFR MFSR SFSR
MFS MFR MFSR
N=16 N=32
N=64 N=128
Figure 3.11: Performance of the speaker recognition system based on SFSR and MFSR analysis techniques for different sizes of testing data for 30 speakers taken from the YOHO database. MFR trained model is used for testing.
3.4 Experimental Results and Discussions
The above mentioned experiments improve the performance when testing is done with MFSR. In this experiment, speaker models are trained using MFSR based analysis technique and testing is done using SFSR and MFSR methods. The experimental results for 30 speakers are shown in Figure 3.12. The recognition performance of 90% is achieved for 3 sec training and testing data using MFSR for codebook of size 128. The performance is higher than the performance of SFSR that provides 80% for codebook of size 64 and the performance of pre- vious experiment (87%). For other data sizes of 6 and 12 sec also MFSR analysis gives better results than the SFSR. Further, the results show that MFSR analysis at both training and testing improves the performance compared to using only MFSR at testing as in the previous experiments.
5 10 15 20 25
40 50 60 70 80 90 100
Amount of testing data in sec
performnce (%)
5 10 15 20 25
40 50 60 70 80 90 100
Amount of testing data in sec
performnce (%)
5 10 15 20 25
40 50 60 70 80 90 100
Amount of testing data in sec
performnce (%)
5 10 15 20 25
40 50 60 70 80 90 100
Amount of testing data in sec
performnce (%)
SFSR MFS MFR MFSR SFSR
MFS MFR MFSR
SFSR MFS MFR MFSR SFSR
MFS MFR MFSR
N=16 N=32
N=64 N=128
Figure 3.12: Performance of the speaker recognition system based on SFSR and MFSR analysis techniques for different sizes of testing data for the first 30 speakers taken from the YOHO database.
MFSR trained model is used for testing.
TH-797_05610204
3. MFSR Analysis of Speech for Limited Data Speaker Recognition
Aforementioned all the experiments are conducted for the whole database of 138 speakers to verify recognition performance under limited data condition. The results are shown in Figure 3.13. The plots demonstrate that for large population also the MFSR analysis methods results in improved recognition performance compared to SFSR. Hence MFSR methods can also be used for large database having limited training and test data. From this study we can conclude that when both training and testing data is small, MFSR analysis techniques on training and testing data improve the recognition performance compared to SFSR.
5 10 15 20 25
20 40 60 80 100
Amount of testing data in sec
performnce (%)
5 10 15 20 25
20 40 60 80 100
Amount of testing data in sec
performnce (%)
5 10 15 20 25
20 40 60 80 100
Amount of testing data in sec
performnce (%)
5 10 15 20 25
20 40 60 80 100
Amount of testing data in sec
performnce (%)
SFSR MFS MFR MFSR SFSR
MFS MFR MFSR
SFSR MFS MFR MFSR SFSR
MFS MFR MFSR
N=128 N=128
N=128 N=128
(a) (b)
(c) (d)
Figure 3.13: Performance of the speaker recognition system based on SFSR and MFSR analysis techniques for different sizes of testing data for 138 speakers. (a) SFSR trained model is used for testing (b) MFS trained model is used for testing (c) MFR trained model is used for testing and (d) MFSR trained model is used for testing.
3.4 Experimental Results and Discussions
To verify the effectiveness of the proposed MFSR analysis, we have conducted the exper- iments on the TIMIT database also. As we have already mentioned that in practice both training and testing data may be limited. Therefore, in this case SFSR and MFSR analysis are studied only when both limited training and testing data are limited. The experimental results are shown in Figure 3.14 and Figure 3.15 for a set of the first 30 and 138 speakers, respectively.
The experimental results for the TIMIT database also resemble those for the YOHO database irrespective of speaker population and amount of data. Hence, we can suggest that MFSR analysis can be used for improving the speaker recognition performance when both training and testing data is limited.
2 4 6 8 10 12 14 16
40 50 60 70 80 90 100
Amount of testing data in sec
performnce (%)
2 4 6 8 10 12 14 16
40 50 60 70 80 90 100
Amount of testing data in sec
performnce (%)
2 4 6 8 10 12 14 16
40 50 60 70 80 90 100
Amount of testing data in sec
performnce (%)
2 4 6 8 10 12 14 16
40 50 60 70 80 90 100
Amount of testing data in sec
performnce (%) SFSR
MFSR SFSR
MFSR
SFSR MFSR SFSR
MFSR
N=16 N=32
N=64 N=128
Figure 3.14: Performance of the speaker recognition system based on SFSR and MFSR analysis techniques for different sizes of testing data for the first 30 speakers taken from the TIMIT database.
TH-797_05610204
3. MFSR Analysis of Speech for Limited Data Speaker Recognition
2 4 6 8 10 12 14
20 30 40 50 60 70 80 90 100
Amount of testing data in sec
performnce (%)
SFSR MFSR N=128
Figure 3.15: Performance of the speaker recognition system based on SFSR and MFSR analysis techniques for different sizes of testing data for the first 138 speakers taken from the TIMIT database.