Session/channel compensation - Performance comparison

3.6 Performance comparison

3.6.1 Session/channel compensation

The session/channel compensation methods form an integral part of all modern speaker verification systems and are critical for making them effective in practical conditions. For the proposed LD-SR system, the compensation can be applied either in the supervector domain or in the sparse

Table 3.2: Performances of various proposed and contrast speaker verification systems with incorporating suitable session/channel compensation methods on NIST 2003 SRE dataset. Note that, with proper compensation both simple and discriminative dictionary based systems have outperformed the i-vector based system.

SV system Representation Channel compensation % EER min-C_DET-03

Contrast CDS i-vector LDA & WCCN 2.24 0.037

supervector pre-JFA 3.61 0.066

XD-SR i-vector LDA & WCCN 5.42 0.102

supervector pre-JFA 4.01 0.069

Proposed

T-LD-SR

supervector

LDA & WCCN 3.43 0.063

LD-SR LDA & WCCN 3.61 0.065

DLD-SR LDA & WCCN 1.98 0.036

LD-SR pre-JFA 1.56 0.031

DLD-SR pre-JFA 1.53 0.028

vector domain. For performing the session/channel variability compensation in the the supervector domain the simplified JFA approach as described in Section 2.3.1 is used. Once the speaker and channel factors are estimated for a given utterance using the JFA process, the session/channel compensated GMM supervector is computed by multiplying the corresponding speaker factor vector with the speaker subspace matrix. This compensated GMM supervector is used for representing the speaker utterance in the subsequent sparse coding stage. This approach of session/channel compensation through JFA based processing of the supervectors prior to the sparse coding is re- ferred to as ‘pre-JFA’ in this work. For applying session/channel compensation in the sparse vector domain, LDA followed by WCCN is used. In case of the the CDS based and the XD-SR systems LDA followed by WCCN is used for i-vectors representations and JFA is used for the supervector representations.

The performance of various SV systems evaluated on NIST 2003 SRE dataset with appropriate kind(s) of session/channel compensation applied are given in Table 3.2. On comparing Table 3.1 and Table 3.2, we note that the relative ordering of the performances of different systems considered remains the same with and without application of session/channel variability compensation except for the case of the LD-SR system. With session/channel compensation using pre-JFA, the LD-SR system is noted to perform better than the i-vector CDS system. In addition, it can be noted that the pre-JFA based compensation is more effective than the post-processing with LDA and

3.6 Performance comparison

−1 −0.5 0 0.5 1

0 0.1 0.2 0.3 0.4

Score

Normalized count

(a) i−vector CDS

−10 −0.5 0 0.5 1

0.1 0.2 0.3 0.4

Score

Normalized count

(b) LD−SR system

−1 −0.5 0 0.5 1

0 0.1 0.2 0.3 0.4

Score

Normalized count

−1 −0.5 0 0.5 1

0 0.1 0.2 0.3 0.4

Score

Normalized count

(d) LD−SR system with pre−JFA True trials

False trials

True trials False trials

True trials

False trials True trials

False trials

Figure 3.9: Histograms of scores generated by the i-vector CDS, LD-SR systems. Sub-figures (a)&(b) correspond to the base systems without session/channel compensation while sub-figures (c)&(d) correspond to systems with appropriate session/channel compensation.

WCCN for the LD-SR systems. The three best performing systems after session/channel variability compensation are the LD-SR, DLD-SR and the i-vector CDS based systems having EERs of 1.56 %, 1.53 % and 2.24 %, respectively.

The performances of the LD-SR system in comparison with the i-vector CDS system with and without session/channel compensation are not straight-forward. To analyze the performance of the systems further, the histograms of the scores for the true and false trials in cases of the LD-SR and the i-vector CDS based systems with and without session/channel compensation are shown in Figure 3.9. On comparing the histograms of the systems without compensation, it can be noted that the distribution of false trials scores for the LD-SR system is much narrow compared to that of the i-vector CDS system. In addition, the mean of the true trial scores are more right shifted for LD-SR system compared to that of the i-vector system. Though these two aspects of the LD-SR system are somewhat positive, the spread of the distribution of the true trial scores is much higher in case of the LD-SR system compared to that of the i-vector CDS system. As a result, with no session/channel compensation applied, the LD-SR system ended up giving poorer performance compared to the i-vector CDS system. With proper session/channel compensation employed, for the LD-SR system the mode of the distribution of the true trial scores has moved significantly away from the origin and the distribution of false trial scores has become narrower compared to the un- compensated case. For the i-vector CDS case, the separation between the distributions of the true

and false trial scores has increased, but at the same time the spread of the distribution of the false trial scores is also increased. As a result of these, with suitable session/channel compensation employed, the LD-SR system happen to provide a significantly higher performance compared to the i-vector CDS system.

Dalam dokumen and my wife, (Halaman 75-78)