Online dictionary learning - EXPLORATION OF SPARSE REPRESENTATION TECHNIQUES IN LANGUAGE RECOGN

B.4 Online dictionary learning

The online dictionary learning [159] approach requires one element (or a small subset) of the training set at a time. On the other hand, batch mode based learning approach (e.g., K-SVD) access whole training sets at each iteration. Thus, online dictionary learning approach can efficiently handle large training set or dynamic training data changing over time.

The steps involved in online dictionary learning using one element of the training set are summarized in Algorithm 6. The dictionary update stage of the online learning procedure is described in Algorithm 7.

Algorithm 6 Online dictionary learning

Inputs: y∈R^m∼p(y)(Assuming the training set composed of i.i.d samples of a distributionp(y)), initial dictionary D₀ ∈R^m×K, Number of iterations T, Regularization parameterλ₁ ∈R.

(i) Matrices A₀=0 andB₀=0(reset the past information) (ii) fort= 1 to T,do

(iii) Draw y_t

(iv) Sparse coding (Compute using LARS):

x_t= arg min

2ky_t−D_t−1xk²2+λ₁kxk1 (B.9)

(v) A_t=A_t−1+x_tx^′_t (vi) B_t=B_t−1+y_tx^′_t

(vii) Dictionary update (Use Algorithm 7, the block coordinate descent withD_t−1as warm restart):

D_t= arg min

1 t

i=1

2ky_i−Dx_ik²2+λ1kx_ik1 (B.10)

(viii) end for

(ix) Output: learned dictionary D_T.

In practice, the convergence speed of the online algorithm can be increased by drawing more than one training examples (i.e., using mini-batch approach) at each iteration. Let us denotey_t,1, . . . ,y_t,P,

Algorithm 7 Dictionary Update

Inputs: Input DictionaryD= [d₁,d₂, . . . ,d_k]∈R^m×K, A= [a₁,a₂, . . . ,a_k]∈R^K×K =Pt

i=1x_ix^′_i B= [b₁,b₂, . . . ,b_k]∈R^m×K=Pt

i=1y_ix^′_i (i) repeat

(ii) for j= 1 to K,do

(iii) update the j-th column to optimize the Equation B.10 u_j = 1

A_jj(bj−Da_j) +d_j (B.11)

d_j = 1

max(ku_jk,1)u_j (B.12)

(iv) end for

(v) until convergence

(vi) Output: Updated dictionary D.

the training examples drawn at iteration t. Then, replace the lines (v) and (vi) of Algorithm 6 by A_t=βA_t−1+

i=1

x_t,ix^′_t,i (B.13)

B_t=βB_t−1+

i=1

y_t,ix^′_t,i (B.14)

where β = ^θ+1−η_θ+1 . Ift < P thenθ=tP elseift≥P thenθ=P²+t−P.

C

Parameters Tuning in K-SVD/OL Learned Dictionary based LR

Contents

C.1 Parameters tuning in learned dictionary based LR with OMP and Lasso . 122 C.2 Parameters tuning in learned dictionary based LR with ENet . . . 127

C.1 Parameters tuning in learned dictionary based LR with OMP and Lasso

The optimal performance of LR using OMP/Lasso-based sparse representation over K-SVD/OL learned dictionary mainly depends upon the three parameters: (i) size of the dictionary (ii) the number of atoms selected while learning the dictionary (dictionary sparsity) (iii) the number of atoms selected while representing the target data (decomposition sparsity). The parameters are tuned to obtain the best results in terms of C_avg and IDR. Figure C.1 shows the performance of LR using OMP based sparse coding of WCCN compensated i-vector over K-SVD learned dictionary with tuning parameters:

dictionary size, dictionary sparsity (γ^′) and decomposition sparsity (γ). The CDS classifier is applied on the training and test s-vectors. The score calibration is done by GB+MLR technique employing multi- class FoCal toolkit [153] prior to final decision. It is observed that the best performance is achieved with a dictionary size of 28,γ^′ = 20andγ = 15for language detection while in language identification, the optimal parameters noted are γ^′ = 20 and γ = 20. The performance of the LR system using Lasso-based sparse coding of WCCN compensated i-vector over K-SVD learned dictionary is shown in Figure C.2. With fixed dictionary size of 28, the best performance in terms of C_avg for language detection is noted with dictionary sparsity λ^′₁ = 0.01 and decomposition sparsityλ₁ = 0.001 while in language identification, the optimal parameters noted are λ^′₁ = 0.02 andλ₁ = 0.001.

In addition to i-vector utterance representation, sparse representation of JFA compensated GMM mean supervector over K-SVD learned dictionary employing OMP/Lasso sparse coding algorithms are investigated and results are shown in the Figures C.3 and C.4. It can be observed that using OMP, the best performance is achieved with a dictionary size of 28, dictionary sparsity γ^′ = 20, decomposition sparsity γ = 15 while using Lasso, the optimum performance is achieved with dictionary sparsity λ^′₁ = 0.05, decomposition sparsity λ1 = 0.001 and dictionary size of 28. On comparing with OMP, a Lasso-based sparse coding algorithm provides slightly better performance.

The work is further extended with OL dictionary employing OMP and Lasso based sparse coding considering both WCCN compensated i-vector and JFA-supervector. The results are reported in Figures C.5, C.6, C.7 and C.8. Using OL employing OMP and the i-vector representation, the best language detection performance is achieved with dictionary sparsityγ^′= 15and decomposition sparsity γ = 20shown in Figure C.5 (a) while for language identification task, the sparsity constraints γ^′= 20 andγ = 20gives best result as shown in Figure C.5 (b). However, using OL dictionary employing Lasso

C.1 Parameters tuning in learned dictionary based LR with OMP and Lasso

Size of Dictionary− − −>

28 56 84 112 140 168 196 224 252 280 308 336 364 392 420

100×Cavg−−−>

2 4 6 8 10 12

14 (a)

γ γ =5 γ γ =10 γ γ =15 γ γ =20

γ γ =5

γ γ =10

γ γ =15

γ γ =20

γ γ =5

γ γ =10

γ γ =15

γ γ =20

γ γ =5

γ γ =10

γ γ =15

γ γ

Size of Dictionary− − −>

28 56 84 112 140 168 196 224 252 280 308 336 364 392 420

IDR(%)−−−>

65 70 75 80 85 90

95 (b)

Figure C.1: Performance of the LR system using the sparse representation of WCCN compensated i-vector over K-SVD learned dictionary with the tuning of dictionary size, dictionary sparsity (γ^′) and decomposition sparsity (γ).The sparse coding is performed using the OMP algorithm. (a)Cavg (b) IDR

Size of Dictionary− − −>

28 56 84 112 140 168 196 224 252 280 308 336 364 392 420 448 100×Cavg−−−>

2.8 3 3.2 3.4 3.6 3.8

4 (a)

λ'1 =0.010, λ 1 =0.001 λ'1 =0.010, λ

1 =0.010 λ'1 =0.010, λ

1 =0.020 λ'1 =0.015, λ

1 =0.001 λ'1 =0.015, λ

1 =0.010 λ'1 =0.015, λ

1 =0.020 λ'1 =0.020, λ₁ =0.001 λ'₁ =0.020, λ₁ =0.010 λ'₁ =0.020, λ₁ =0.020 λ'₁ =0.050, λ₁ =0.001 λ'1 =0.050, λ

1 =0.010 λ'₁ =0.050, λ₁ =0.020

Size of Dictionary− − −>

28 56 84 112 140 168 196 224 252 280 308 336 364 392 420 448

IDR(%)−−−>

87 88 89 90

91 (b)

Figure C.2: Performance of the LR system using the sparse representation of WCCN compensated i-vector over K-SVD learned dictionary with tuning of dictionary size, dictionary sparsity (λ^′₁) and decomposition sparsity (λ₁). The sparse coding is performed using Lasso algorithm. (a)Cavg(b) IDR

Size of Dictionary− − −>

0 28 56 84 112 140 168 196 224 252 280 308 336 364 392

100×Cavg−−−>

6 8 10 12 14 16

18 (a)

γ γ =5 γ γ =10 γ γ =15 γ γ =20

γ γ =5

γ γ =10

γ γ =15

γ γ =20

γ γ =5

γ γ =10

γ γ =15

γ γ =20

γ γ =5

γ γ =10

γ γ =15

γ γ =20

Size of Dictionary− − −>

0 28 56 84 112 140 168 196 224 252 280 308 336 364 392

IDR(%)−−−>

60 65 70 75 80

85 (b)

Figure C.3: Performance of the LR system using the sparse representation of JFA-supervector over K-SVD learned dictionary with tuning of dictionary size, dictionary sparsity (γ^′) and decomposition sparsity (γ). The sparse coding is performed using OMP algorithm. (a)Cavg(b) IDR

Size of Dictionary− − −>

28 56 84 112 140 168 196 224 252 280 308 336 364

100×Cavg−−−>

4 5 6 7 8 9

10 (a)

λ'1 =0.010, λ₁ =0.001 λ'₁ =0.010,λ₁ =0.010 λ'1=0.010, λ

1 =0.020 λ'1 =0.015, λ

1 =0.001 λ'1 =0.015, λ

1 =0.010 λ'1 =0.015, λ₁ =0.020 λ'1 =0.020, λ

1 =0.001 λ'1 =0.020, λ

1 =0.010 λ'₁ =0.020, λ₁ =0.020 λ'₁ =0.050, λ₁ =0.001 λ'1 =0.050, λ

1 =0.010 λ'1=0.050, λ

1 =0.020

Size of Dictionary− − −>

28 56 84 112 140 168 196 224 252 280 308 336 364

IDR(%)−−−>

74 76 78 80 82 84

86 (b)

Figure C.4: Performance of the LR system using the sparse representation of JFA-supervector over K-SVD learned dictionary with tuning of dictionary size, dictionary sparsity (λ^′₁) and decomposition sparsity (λ₁). The Lasso based sparse coding is performed. (a)Cavg(b) IDR

C.1 Parameters tuning in learned dictionary based LR with OMP and Lasso

Dictionary Sparsity (γ^′)− −−>

5 10 15 20

100×Cavg−−−>

3.5 4 4.5 5 5.5

6 (a)

γ =5 γ =10 γ =15 γ =20

Dictionary Sparsity (γ^′)− −−>

5 10 15 20

IDR(%)−−−>

82 84 86 88

(b)

γ =5 γ =10 γ =15 γ =20

Figure C.5: Performance of the LR system using the sparse representation of WCCN compensated i-vector over OL learned dictionary (28 dictionary atoms) with tuning of dictionary sparsity (γ^′) and decomposition sparsity (γ). The sparse coding is performed using OMP algorithm. (a)Cavg (b) IDR

Dictionary Sparsity (λ^′₁)− −−>

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1

100×Cavg−−−>

3 3.2 3.4 3.6 3.8

4 (a)

λ₁ =0.001 λ₁ =0.005 λ₁ =0.010 λ₁ =0.015 λ₁ =0.020

Dictionary Sparsity (λ^′₁)− −−>

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1

IDR(%)−−−>

87 88 89

90 (b)

λ₁ =0.001 λ₁ =0.005 λ₁ =0.010 λ₁ =0.015 λ₁ =0.020

Figure C.6: Performance of the LR system using the sparse representation of WCCN compensated i-vector over OL learned dictionary (28 dictionary atoms) with tuning of dictionary sparsity (λ^′₁) and decomposition sparsity coefficient (λ₁). The Lasso based sparse coding is performed. (a)Cavg(b) IDR

Dictionary Sparsity (γ^′)− −−>

5 10 15 20

100×Cavg−−−>

5 10

15 (a)

γ =5 γ =10 γ =15 γ =20

Dictionary Sparsity (γ^′)− −−>

5 10 15 20

IDR(%)−−−>

65 70 75 80

85 (b)

γ =5 γ =10 γ =15 γ =20

Figure C.7: Performance of the LR system using the sparse representation of JFA-supervector over OL learned dictionary (28 dictionary atoms) with tuning of dictionary sparsity (γ^′) and decomposition sparsity (γ). The sparse coding is performed using OMP algorithm. (a)Cavg(b) IDR

Dictionary Sparsity (λ^′₁)− −−>

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1

100×Cavg−−−>

4 4.5 5

5.5 λ₁ =0.001

λ₁ =0.010 λ₁ =0.020

Dictionary Sparsity (λ^′₁)− −−>

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1

IDR(%)−−−>

83 84 85 86

87 λ₁ =0.001

λ₁ =0.010 λ₁ =0.020

Figure C.8: Performance of the LR system using the sparse representation of JFA-supervector over OL learned dictionary (28 dictionary atoms) with tuning of dictionary sparsity (λ^′₁) and decomposition sparsity (λ₁). The Lasso based sparse coding is performed. (a)Cavg(b) IDR

Dalam dokumen EXPLORATION OF SPARSE REPRESENTATION TECHNIQUES IN LANGUAGE RECOGNITION (Halaman 151-159)