3.6 Summary
4.1.4 Separability (ε)
The separability measure is well known from statistics [138]. It quantifies the discriminative power of a feature set for a classification task. The separability is known as a measure for the quality of a particular feature set for a classification problem [138]. The separability measure can be calculated from a labelled set of training data, i.e. for each feature vector in the set, the corresponding class must be known. Let Ξ denote the set of feature vectors xassigned to the i-th class. The number of feature vectors in the i-th set is NΞi =|Ξi|. Let Ns be the total number of classes for the classification task and let Nm be the total number of feature vectors in the training data (from all classes).
From the training data, the within-class covariance matrix is given by Vx= 1
Nm Ns
X
i=1
X
x∈Ξi
(x−µi)(x−µi)T (4.5)
and the between-class covariance matrix is given by Bx=
Ns
X
i=1
NΞi
Nm
(µi−µ)(µi−µ)T (4.6)
are calculated, where
µi = 1 NΞi
X
x∈Ξi
x and µ=
Ns
X
i=1
NΞi Nm
µi (4.7)
The separability measure shall be larger if the between-class covariance gets smaller or if the within-class covariance gets larger. Accordingly, the separability measure is empirically defined by the term
Jx=V−1x Bx (4.8)
To obtain a scalar measure for the separability of the classes a trace criterion is used [138]
ε(x) = tr(Jx) = tr(V−1x Bx) (4.9) The separability depends on the definition of the classes. Comparing ε(x) for different feature vectors xwith the same class definitions, a larger value indicates a better suitability of the corresponding feature vector for classification and estimation
Table 4.1 lists the mutual information between NB and HB, HB entropy and ratio measure computed for different classes (digits) for speech of both children and adults. These measures were applied over the speech signals after different types of ABWE transformation (global and class specific). The transformations are learnt using the training data, different from that of test data. To compute the measures, the test data was applied on the learnt transformations.
It can be observed that the I, H and RIH varies across different classes. It is to note that the averagedRIH increases from 3.03% for global transformation case to 11.64% using class specific transformation. This demonstrates the significance of exploiting class-specific information for ABWE. It is also interesting to note that the increase in the averaged RIH for children’s speech is less than that for adults speech. This trend may be attributed to the loss of spectral information for children in case of narrowband and and also higher variability in the vocal tract length. The separability measure shows a lower value for the children’s speech compared to that of adult. This may be attributed to significant overlap of class-specific information in the feature space. However, sinceRIH shows significant increase, ABWE method can be developed using class (digit) specific information.
TH-1705_08610211
4.1 Comparison of Children’s and Adults’ Speech using Statistical Measures
Table 4.1: The mutual information (I(X;\Y)), high-band entropy (H(Y\)), and their ratio (RbIH) for children’s and adults’ speech with application of global and class specific ABWE trans- form.Separability ε(x) for children’s and adults’ speech with application of global ABWE transform.
Children
Class Global transformation Class specific transformation I\(X;Y) H(Y\) RbIH I\(X;Y) H(Y\) RbIH
[bits] [bits] % [bits] [bits] %
one 0.96 47.65 2.02 5.35 50.46 10.61
two 1.66 48.77 3.40 6.09 50.50 12.06
three 1.39 48.74 2.85 4.81 49.47 9.72
four 1.25 48.05 2.60 5.01 50.64 9.89
five 1.57 47.52 3.31 8.65 52.89 16.36
six 3.05 50.02 6.09 5.41 50.10 10.60
seven 1.86 48.67 3.83 6.71 50.55 13.28
eight 2.26 48.31 4.69 6.77 51.50 13.15
nine 0.75 47.86 1.58 6.10 50.35 12.11
zero 0.49 48.99 1.00 3.59 47.97 7.48
oh 0.86 48.33 1.78 6.48 50.23 12.89
Avg. 1.47 48.47 3.03 5.90 50.50 11.64
Adults
Class Global transformation Class specific transformation I\(X;Y) H(Y\) RbIH I\(X;Y) H(Y\) RbIH
[bits] [bits] % [bits] [bits] %
one 0.86 48.78 1.76 6.23 51.94 11.99
two 1.13 48.27 2.34 6.78 52.10 13.02
three 1.05 47.99 2.20 6.23 51.69 12.07
four 1.60 47.48 3.36 6.48 52.69 12.30
five 1.98 47.86 4.13 9.51 52.80 18.02
six 3.14 50.42 6.22 7.43 51.85 14.32
seven 1.62 48.91 3.31 9.59 52.17 18.39
eight 1.32 48.46 2.73 7.03 51.63 13.61
nine 0.97 47.53 2.04 7.82 51.70 15.13
zero 0.84 48.61 1.73 3.71 48.98 7.57
oh 1.50 47.95 3.12 6.59 51.17 12.88
Avg. 1.48 48.42 3.05 7.05 51.68 13.59
Separabilityε(x)
Children Adults
Global transformation Global transformation
Avg. 4.70 6.22
TH-1705_08610211
4.1 Comparison of Children’s and Adults’ Speech using Statistical Measures
Table 4.2 shows the I, H, RIH and ε computed by exploiting the age-specific information.
It can be noted that, there is no increase in the RIH value by exploiting the age-specific information. However, the separability value increases which demonstrates that, the feature set using age-specific information is more discriminative compared to global transform. Therefore age-specific information can also be exploited for ABWE.
Table 4.2: The mutual information (I(X;\Y)), the high-band entropies (H(Y\)), and their ratios (RbIH) for children’s speech with application of global and age specific ABWE transforms.
Separability ε(x) for children’s speech with application of global ABWE and age specific transform.
Children
Age Global transformation Age Specific transform in I(X;\Y) H(Y\) RbIH I\(X;Y) H(Y\) RbIH
years [bits] [bits] % [bits] [bits] %
06 7.43 32.85 22.61 7.44 33.19 22.43
07 6.05 32.76 18.47 5.92 32.89 18.01
08 6.12 32.78 18.68 6.06 32.99 18.38
09 6.05 32.84 18.41 5.97 33.10 18.04
10 5.83 32.89 17.73 5.74 33.12 17.33
11 5.82 32.76 17.76 5.70 33.00 17.27
12 5.46 32.85 16.61 4.93 33.00 14.95
13 5.19 32.79 15.83 4.31 32.84 13.14
14 5.22 32.86 15.89 4.29 32.88 13.04
15 5.14 32.80 15.67 4.24 32.82 12.91
Avg. 5.87 32.81 17.88 5.68 33.02 17.20 Separabilityε(x)
Global transformation Age Specific transform
Avg. 7.62 8.18
Table 4.3 shows that I, H, RIH and ε values computed using the delta (∆) features. The delta features, essentially refers to the measure of change happening in the feature sequence.
That is how, the features are changing with time. Using delta features, increases both the ratio as well as separability values inferring that the delta (∆) features can be exploited for ABWE.
Table 4.3: The mutual information (I\(X;Y)), the high-band entropies (H(Y\)), and their ratios (RbIH) for children’s speech with application of global and age specific ABWE transforms with ∆. Separability ε(x) for children’s speech with application of global ABWE and age specific transform with ∆.Half window size, Θ = 13is selected to compute ∆.
Children
Global transformation Age Specific transform
Age with∆ with∆
in I(X;\Y) H(Y\) RbIH I\(X;Y) H(Y\) RbIH
years [bits] [bits] % [bits] [bits] % 06 16.37 47.01 34.83 27.36 51.64 52.98 07 12.79 45.10 28.36 15.21 44.29 34.33 08 12.92 45.49 28.39 15.38 44.41 34.62 09 13.21 45.72 28.90 14.05 43.42 32.35 10 12.62 45.42 27.79 14.28 43.48 32.84 11 12.46 45.06 27.65 14.75 44.20 33.37 12 12.12 45.27 26.78 15.06 44.37 33.94 13 11.91 45.28 26.31 15.01 44.49 33.74 14 11.95 45.38 26.33 15.19 44.67 34.00 15 11.81 45.21 26.13 15.15 44.61 33.97 Avg. 12.68 45.36 27.96 14.92 44.14 33.80
Separabilityε(x)
Global transformation Age Specific transform
with∆ with∆
Avg. 7.86 8.57
Table 4.4 shows the I, H and RIH for different values of ∆ computed in the global trans- formation case. In this case, no age information is used. As it can be observed, the computed values are sensitive to the ∆ value. That is, as the ∆ value increases, the values of all the parameters increases. This shows the significance of ∆feature.
TH-1705_08610211
4.1 Comparison of Children’s and Adults’ Speech using Statistical Measures
Table 4.4: The mutual information (I(X;\Y)), the high-band entropies (H(Y\)), and their ratios (RbIH) for children’s speech with application of global ABWE transforms with∆.Half win- dow size, Θ is selected between range of 1 to 15 to compute ∆.
ABWE-GT+∆
Age in Half window size,Θ
years 1 2 3 4 5
I\(X;Y) H(Y\) RbIH I(X;\Y) \H(Y) RbIH I(X;\Y) \H(Y) RbIH I(X;\Y) H(Y\) RbIH I(X;\Y) H(Y\) RbIH
[bits] [bits] % [bits] [bits] % [bits] [bits] % [bits] [bits] % [bits] [bits] %
06 11.66 34.87 33.44 12.31 37.99 32.40 12.26 39.13 31.32 13.34 41.35 32.27 13.05 42.44 30.75 07 10.42 34.40 30.29 11.11 37.44 29.66 10.87 38.53 28.20 11.65 40.51 28.76 11.34 41.43 27.38 08 10.48 34.70 30.19 11.46 37.81 30.30 11.05 38.87 28.42 11.93 40.89 29.18 11.65 41.83 27.84 09 10.54 34.71 30.38 11.59 37.88 30.60 11.15 38.92 28.64 12.04 40.97 29.40 11.72 41.92 27.95 10 10.37 34.63 29.96 11.25 37.81 29.74 10.81 38.85 27.82 11.69 40.88 28.59 11.42 41.81 27.32 11 10.21 34.32 29.75 11.15 37.43 29.78 10.67 38.50 27.71 11.36 40.44 28.10 11.18 41.40 27.01 12 10.06 34.61 29.06 11.01 37.68 29.23 10.45 38.79 26.95 11.31 40.69 27.79 11.06 41.64 26.56 13 9.93 34.64 28.69 10.90 37.69 28.92 10.29 38.80 26.53 11.15 40.69 27.41 10.88 41.63 26.13 14 9.93 34.69 28.63 10.95 37.76 28.99 10.32 38.86 26.56 11.23 40.77 27.53 10.93 41.71 26.20 15 9.90 34.61 28.61 10.86 37.66 28.82 10.26 38.78 26.47 11.12 40.65 27.35 10.84 41.59 26.07 Avg. 10.33 34.56 29.90 11.27 37.68 29.90 10.82 38.74 27.92 11.64 40.73 28.59 11.38 41.67 27.32
Age in Half window size,Θ
years 6 7 8 9 10
I\(X;Y) H(Y\) RbIH I(X;\Y) \H(Y) RbIH I(X;\Y) \H(Y) RbIH I(X;\Y) H(Y\) RbIH I(X;\Y) H(Y\) RbIH
[bits] [bits] % [bits] [bits] % [bits] [bits] % [bits] [bits] % [bits] [bits] %
06 13.45 43.29 31.08 13.76 44.22 31.11 13.68 44.34 30.86 13.47 44.96 29.96 14.55 45.58 31.91 07 11.63 42.23 27.53 11.67 42.95 27.17 11.61 43.13 26.92 11.29 43.66 25.86 12.10 44.21 27.36 08 12.12 42.64 28.41 12.15 43.40 27.99 11.97 43.54 27.49 11.62 44.04 26.38 12.39 44.57 27.79 09 12.15 42.72 28.45 12.14 43.46 27.93 12.00 43.60 27.53 11.74 44.14 26.60 12.67 44.74 28.32 10 11.76 42.61 27.61 11.71 43.33 27.03 11.55 43.45 26.57 11.27 43.98 25.62 12.11 44.53 27.19 11 11.66 42.20 27.63 11.59 42.92 27.01 11.47 43.11 26.61 11.18 43.62 25.63 12.02 44.23 27.19 12 11.44 42.46 26.96 11.27 43.14 26.12 11.23 43.33 25.91 10.87 43.81 24.81 11.76 44.41 26.48 13 11.30 42.45 26.62 11.08 43.14 25.69 11.06 43.32 25.52 10.70 43.80 24.42 11.57 44.40 26.05 14 11.41 42.54 26.83 11.16 43.22 25.83 11.13 43.40 25.64 10.79 43.89 24.58 11.63 44.49 26.15 15 11.23 42.41 26.49 11.00 43.09 25.53 11.00 43.27 25.42 10.62 43.75 24.27 11.53 44.35 26.00 Avg. 11.80 42.48 27.79 11.75 43.20 27.21 11.64 43.37 26.83 11.33 43.88 25.82 12.18 44.46 27.4
Age in Half window size,Θ
years 11 12 13 14 15
I\(X;Y) H(Y\) RbIH I(X;\Y) \H(Y) RbIH I(X;\Y) \H(Y) RbIH I(X;\Y) H(Y\) RbIH I(X;\Y) H(Y\) RbIH
[bits] [bits] % [bits] [bits] % [bits] [bits] % [bits] [bits] % [bits] [bits] %
06 14.03 46.06 30.45 15.27 46.56 32.81 16.37 47.01 34.83 15.91 47.49 33.50 15.55 47.18 32.97 07 11.59 44.62 25.97 12.46 44.88 27.77 12.79 45.10 28.36 12.35 45.52 27.13 12.22 45.51 26.86 08 11.91 44.98 26.48 12.60 45.25 27.85 12.92 45.49 28.39 12.49 45.93 27.19 12.37 45.92 26.95 09 12.19 45.19 26.98 12.85 45.45 28.27 13.21 45.72 28.90 12.87 46.19 27.86 12.72 46.17 27.55 10 11.62 44.92 25.86 12.24 45.18 27.10 12.62 45.42 27.79 12.28 45.87 26.77 12.20 45.89 26.58 11 11.43 44.60 25.64 12.04 44.84 26.84 12.46 45.06 27.65 12.11 45.50 26.61 11.99 45.54 26.34 12 11.24 44.79 25.09 11.72 45.02 26.03 12.12 45.27 26.78 11.85 45.71 25.93 11.80 45.75 25.79 13 11.06 44.79 24.70 11.52 45.02 25.59 11.91 45.28 26.31 11.68 45.72 25.55 11.62 45.77 25.40 14 11.12 44.88 24.77 11.56 45.11 25.63 11.95 45.38 26.33 11.73 45.82 25.60 11.66 45.87 25.41 15 11.02 44.74 24.63 11.44 44.95 25.45 11.81 45.21 26.13 11.58 45.65 25.36 11.52 45.69 25.21 Avg. 11.66 44.86 25.99 12.30 45.11 27.26 12.68 45.36 27.96 12.33 45.80 26.91 12.22 45.81 26.67
Table 4.5 shows I, H andRIH for different values of∆computed in the age-specific case. As in the global transform case, it can be observed that the computed values are sensitive to the
∆value. That is, as the ∆value increases, the values of all the parameters increases. However, the use of age-specific information reflects the speaking rate information represented in the form of higher values for these parameters. In case of children with smaller age, the values of the parameters are high for higher delta values. Alternatively, the values are higher in case of higher age groups children for smaller values of ∆. This infers that, the variability associated with children’s speaking rate is indeed reflected in the estimated parameters of I, H and RIH
using age-specific information.
TH-1705_08610211
4.1 Comparison of Children’s and Adults’ Speech using Statistical Measures
Table 4.5: The mutual information (I(X;\Y)), the high-band entropies (H(Y\)), and their ratios (RbIH) for children’s speech with application of age Specific ABWE transforms with∆.Half window size, Θis selected between range of 1 to 15 to compute ∆.
ABWE-AG+∆
Age in Half window size,Θ
years 1 2 3 4 5
I(X;\Y) H(Y\) RbIH I\(X;Y) \H(Y) RbIH I\(X;Y) \H(Y) RbIH I(X;\Y) \H(Y) RbIH I(X;\Y) \H(Y) RbIH
[bits] [bits] % [bits] [bits] % [bits] [bits] % [bits] [bits] % [bits] [bits] %
06 18.68 39.13 47.74 18.77 42.64 44.02 21.63 45.38 47.67 23.08 47.21 48.89 24.12 48.58 49.66 07 11.44 35.07 32.61 12.29 38.27 32.12 12.80 39.67 32.26 13.91 41.35 33.64 14.69 42.14 34.86 08 12.06 34.82 34.63 12.54 37.02 33.87 12.76 38.59 33.08 13.14 40.22 32.67 14.19 41.70 34.03 09 12.07 34.06 35.44 12.11 37.32 32.44 12.73 38.57 33.01 13.08 40.27 32.49 12.99 40.93 31.74 10 10.97 34.21 32.09 12.27 36.89 33.26 12.46 38.28 32.55 12.88 39.98 32.22 13.94 41.20 33.84 11 12.38 35.14 35.22 12.45 38.16 32.62 13.51 39.95 33.81 13.71 41.23 33.25 14.77 42.39 34.86 12 12.29 35.03 35.09 12.58 38.07 33.06 13.80 39.96 34.54 13.93 41.07 33.92 14.47 42.18 34.31 13 12.43 35.19 35.31 12.81 38.18 33.56 13.88 40.16 34.56 14.23 41.39 34.39 14.75 42.45 34.74 14 12.55 35.26 35.60 12.91 38.29 33.72 14.00 40.27 34.76 14.25 41.46 34.38 14.85 42.57 34.89 15 12.41 35.26 35.21 12.78 38.29 33.37 13.95 40.21 34.69 14.24 41.43 34.38 14.77 42.55 34.71 Avg. 12.06 34.81 34.65 12.48 37.70 33.11 13.20 39.31 33.59 13.58 40.80 33.28 14.34 41.90 34.22
Age in Half window size,Θ
years 6 7 8 9 10
I(X;\Y) H(Y\) RbIH I\(X;Y) \H(Y) RbIH I\(X;Y) \H(Y) RbIH I(X;\Y) \H(Y) RbIH I(X;\Y) \H(Y) RbIH
[bits] [bits] % [bits] [bits] % [bits] [bits] % [bits] [bits] % [bits] [bits] %
06 25.13 49.75 50.51 26.35 50.62 52.06 27.36 51.64 52.98 27.99 52.53 53.28 26.58 52.51 50.63 07 15.01 43.02 34.88 15.56 43.74 35.57 15.21 44.29 34.33 15.01 44.59 33.66 14.90 45.11 33.04 08 14.77 42.79 34.52 15.29 43.71 34.98 15.38 44.41 34.62 15.56 44.95 34.62 15.43 45.21 34.13 09 13.80 42.19 32.72 14.20 42.87 33.12 14.05 43.42 32.35 14.19 43.95 32.29 14.28 44.44 32.14 10 14.11 42.03 33.58 14.30 42.92 33.32 14.28 43.48 32.84 14.13 43.87 32.21 13.60 44.09 30.85 11 15.42 43.40 35.52 14.97 43.73 34.23 14.75 44.20 33.37 15.24 44.83 33.99 15.97 45.45 35.14 12 14.20 42.88 33.11 14.44 43.55 33.16 15.06 44.37 33.94 15.16 44.82 33.84 14.90 45.06 33.06 13 14.34 43.05 33.31 14.62 43.76 33.40 15.01 44.49 33.74 15.16 44.96 33.73 15.11 45.28 33.36 14 14.54 43.21 33.65 14.75 43.92 33.59 15.19 44.67 34.00 15.33 45.13 33.97 15.31 45.45 33.68 15 14.45 43.17 33.48 14.71 43.86 33.54 15.15 44.61 33.97 15.38 45.12 34.09 15.26 45.42 33.59 Avg. 14.74 42.87 34.38 14.91 43.54 34.24 14.92 44.14 33.80 15.09 44.65 33.80 15.13 45.05 33.58
Age in Half window size,Θ
years 11 12 13 14 15
I(X;\Y) H(Y\) RbIH I\(X;Y) \H(Y) RbIH I\(X;Y) \H(Y) RbIH I(X;\Y) \H(Y) RbIH I(X;\Y) \H(Y) RbIH
[bits] [bits] % [bits] [bits] % [bits] [bits] % [bits] [bits] % [bits] [bits] %
06 28.83 53.74 53.64 27.83 53.77 51.75 27.97 54.13 51.67 30.18 55.25 54.62 30.67 55.61 55.16 07 14.71 45.61 32.25 14.98 46.19 32.42 15.05 46.50 32.36 15.72 46.99 33.46 16.44 47.67 34.49 08 15.79 45.72 34.53 15.92 46.17 34.47 16.09 46.61 34.53 16.35 47.04 34.76 15.88 47.47 33.46 09 13.91 44.82 31.03 14.92 45.59 32.73 15.16 45.87 33.04 15.31 46.25 33.11 15.45 46.69 33.10 10 13.49 44.46 30.34 14.05 45.30 31.03 13.44 45.28 29.68 13.67 45.71 29.90 13.82 46.12 29.97 11 16.35 45.87 35.64 15.44 46.09 33.49 15.79 46.36 34.07 14.94 46.56 32.09 15.12 46.95 32.21 12 15.19 45.44 33.43 14.88 45.69 32.57 14.71 46.12 31.90 14.29 46.34 30.84 14.34 46.71 30.70 13 15.41 45.66 33.75 15.05 45.86 32.81 14.99 46.30 32.37 14.57 46.53 31.32 14.72 46.97 31.34 14 15.51 45.79 33.88 15.21 46.03 33.04 15.16 46.46 32.62 14.75 46.70 31.59 14.86 47.10 31.55 15 15.48 45.78 33.82 15.15 46.00 32.93 15.10 46.44 32.51 14.70 46.70 31.47 14.80 47.10 31.43 Avg. 15.27 45.48 33.57 15.26 45.95 33.20 15.31 46.24 33.11 15.20 46.58 32.63 15.29 47.01 32.53
Motivated by earlier work [90], we also calculated the mutual information between bands with inclusion of ∆by combining static and delta features as [C0−C4,∆C0−∆C4] and [C0− C2,∆C0−∆C2] for LB and HB, respectively. Thus the length of features is kept identical for with/without delta features. Fig. 4.1 shows the plots for both global and age-specific ABWE cases for with/without delta features. It is to note that without deltas in MFCC feature, the ratio for global and age-specific ABWE are very close. However, the relative increase is much more for age-specific ABWE than that for global ABWE with delta features.
TH-1705_08610211
4.1 Comparison of Children’s and Adults’ Speech using Statistical Measures
Figure 4.1: Plot of ratios (RbIH) for global (ABWE-GT) and age-specific (ABWE-AG) models with and without ∆ (a). Plot of Separability (ε(x)) for global (ABWE-GT) and age-specific (ABWE-AG) models with and without ∆ (b). Half window size, Θ is selected between range of 1 to 15 to compute∆.
As demonstrated in the earlier tables and figures, the use of class-specific, age-specific and delta information indeed improves the ratio and separability values. Therefore these can be exploited for developing ABWE methods. The development of different ABWE methods using these information is explained in the following sections.