• Tidak ada hasil yang ditemukan

Evaluation of the CV recognition system

Dalam dokumen Biswajit Dev Sarma (Halaman 148-151)

5.6.1 Consonant recognition

Initially, evaluation of the obstruent consonant recognition is performed using HMM-GMM models to study the performances for different obstruent sound categories, such as, unaspirated stops, aspi- rated stops, fricatives and affricates. Then, performances in other systems are shown using DNN-based and SGMM-based acoustic models.

Results using VTR and CoR:VOPs are detected using the FA based method and features are extracted in three different ways 1) 80 ms fixed duration, 2) with CoR and 3) with VTR, as described in the previous section. Phone models are built for these three sets of features using HMM-GMM acoustic modeling technique and performances are evaluated in terms of phone recognition accuracies.

Figure 5.10 shows performances of unaspirated stops using VTR and fixed duration features. Non- linear mapping function used in VTR depends on two parameters, q and Tmx, respectively. Different combination of these two parameters are used and the best recognition rate is found for q=6.5 and Tms=50 ms. Recognition accuracy for different obstruent sound categories are shown in Table 5.4.

When VTR is used, performance of both unaspirated and aspirated stops increases, but performance of fricatives and affricates decreases. However, there is a small overall improvement. When evaluated with CoR, performance of all sound categories are significantly improved with a little decrement in

Table 5.4: Recognition accuracy (% Acc) of consonants in CV units using consonant onset refinement (CoR) and variable transition region (VTR) duration forq=6.5 andTmx= 50 ms. Cond. refers to the conditional use of VTR and CoR.

Method Unaspirated Aspirated Fricatives & Overall stops stops Affricates

Fixed duration 61.19 39.82 63.06 59.69

VTR 62.04 40.95 62.78 60.20

CoR 62.50 46.15 71.95 63.95

VTR and CoR 62.62 43.44 70.43 63.26

Cond. VTR and CoR 64.16 50.90 70.64 64.95

case of unaspirated stops. Overall, a 4.26 % absolute improvement is achieved. Both VTR and CR is performed together and absolute improvement is reduced to 3.57 % (VTR and CoR in Table 5.4).

6 6.5 7 7.5 8 8.5 9

68 68.5 69 69.5 70

q

Accuracy (%)

Tmax=40 ms Tmax=50 ms Tmax=60 ms Fixed duration

Figure 5.10: Unaspirated stop consonant recognition performance (% Acc) comparison between fixed and variable duration transition region with differentqandTmx. The arrow shows the maximum performance.

Results with conditional use of VTR and CoR: From Table 5.4 it is observed that although the use of CoR improves the performance of all obstruent sounds, the improvement is very less in case of unaspirated stops compared to other obstruents. This can be justified by the following fact.

Unaspirated stops have shorter burst region and VOT, sometimes 5 - 10 ms only. Thus, the VOT is included in the 40 ms region and no additional refinement is required to capture the VOT. On the other hand, VOT in aspirated stops are longer, sometimes 90-100 ms in duration. Typical duration of fricative and affricate is also longer than 40 ms. Therefore, 40 ms region present before the VOP does not include the entire consonant region and a consonant onset refinement is required. Similarly, use of VTR is improving performance of all stops, but not the fricatives and affricates. Since consonant region is longer, useful information is present in the consonant region rather than the transition region.

Even though transition region contains information, that seems to become redundant for automatic recognition.

Based on these observations, it can be concluded that a conditional use of CoR and VTR infor-

Table 5.5: Recognition accuracy (% Acc) of obstruents in CV units for fixed duration method and the proposed method using different VOP detection and acoustic modeling techniques

Acoustic FA based VOP SP based VOP

Models Fixed duration Proposed Fixed duration Proposed

HMM-GMM 59.69 64.95 55.30 61.23

HMM-DNN 65.95 70.73 61.27 64.34

HMM-SGMM 72.62 76.85 68.36 72.26

Table 5.6: CV unit recognition performance (% Acc) for fixed duration method and the proposed method using different VOP detection and acoustic modeling techniques. Results are shown for CV units containing an obstruent.

Acoustic Models FA based VOP SP based VOP for obstruents Fixed duration Proposed Fixed duration Proposed

HMM-GMM 46.02 49.86 43.03 47.60

HMM-DNN 51.19 54.21 47.31 49.92

HMM-SGMM 56.42 59.43 52.58 56.26

mation may be more helpful to get benefits from both the methods. If an obstruent onset is detected within 40 - 100 ms region before the VOP, then CoR is performed, otherwise, VTR is performed.

The conditional combination provides further improvements in all obstruent sound categories. Overall absolute improvement increases to 5.26 % (Cond. VTR, CoR in Table 5.4).

Table 5.5 shows the obstruent recognition performance using two different VOP detection and three different acoustic modeling techniques. HMM-SGMM system performs the best among the three modeling approaches and FA based VOP detection is found to give better consonant recognition than the SP based method. In all the cases, the proposed method with conditional use of CoR and VTR gives improved performance than the baseline method using the fixed duration features.

5.6.2 CV unit recognition

To see the effect of improved obstruent recognition in the CV unit recognition performance, vowels are separately recognized. Vowel features are extracted from the region between the VOP and the end of the CV segment. It is found that HMM-GMM acoustic model gives the best performance (with phone error rate of 23.5%) among the three modeling techniques when 40 dimensional time- spliced LDA+ MLLT+ fMLLR transformed features are used for building the systems. CV unit recognition performance with vowels decoded with HMM-GMM system and consonants decoded with all three systems are shown in Table 5.6. Baseline CV unit recognition (with fixed duration) system is

compared with the proposed system. In each case, an approximate 3% improvement is achieved with the proposed method over the baseline system.

Dalam dokumen Biswajit Dev Sarma (Halaman 148-151)