Non-linear Subspace Modeling Using Polynomial Function

dimensionL= 12,L= 13,L= 12andL= 12, the average values of correlation coefficients are decreased significantly over the average values of conventional correlation coefficients, respectively, as shown in Figure 3.9. The average values of correlation coefficients are decreased by 37.08% (0.23 to 0.15), 16.20% (0.20 to 0.17), 33.57% (0.23 to 0.15) and 33.74% (0.24 to 0.16) in comparison to the average value of correlation coefficients determined using the conventional cases for angry, sad, lombard and happy speech, respectively. These experimental results demonstrate that, the HLDA- based low-rank subspace projection in MLLT-based semi-tied adaptation technique is very effective for the subspace projection of features and mode parameters onto the specific lower dimensional decorrelated subspace. The reduced value of correlation for the neutral speech features, normalized stressed speech features and the model parameters have resulted in the effective representation onto the feature and the model-space, respectively. The proposed linear subspace modelling technique by exploring the orthogonal projection approaches along with the HLDA-based low-rank subspace projection technique have significantly reduced the acoustic mismatch between the neutral and the stressed speech and developed the effective stress normalization method.

3.4 Non-linear Subspace Modeling Using Polynomial Function

Stressed speech

Neutral speech

Framing

Feature extraction

Feature

extraction Non-linear transformation

Estimation of transformation

matrix

Transformation matrix

Normalize features

Figure 3.10: The proposed non-linear subspace modelling technique for stress normalization.

of speech production system under stress condition [6, 22, 33, 41–45]. Therefore, the analysis of stressed speech onto the non-linear subspace may help in developing the robust and the computa- tionally efficient stress normalization algorithms.

In this Section, the stress normalization technique is designed by studying the variance mismatch between the neutral and the stressed speech onto the non-linear subspace. To reduce the acoustic variabilities, both the neutral and the stressed speech are projected onto a common subspace. The set of bases for this common subspace is estimated using the neutral speech data as it comprises the speech-specific attributes and it helps in spanning the speech information. This common subspace is generally referred to as the speech subspace. Consequently, the subspace projection of neutral and stressed speech onto the speech subspace can alleviate the effect of stress as well as retain the speech information. The proposed approach has explored the subspace projection through the non-linear transformation technique to model the non-linear subspace. In this work, the non-linear transformation has been accomplished onto the framework of polynomial function. The non-linearity between the speech- and the stress-specific attributes is investigated onto this non-linear subspace and it helps in separating them. The details involved in the proposed stress normalization algorithm is depicted in the block diagram shown in Figure 3.10. From this figure, it is evident that, the efficacy of non-linear subspace modelling depends on the estimation of an effective transformation matrix. The estimation of transformation matrix, which constitutes the acoustic properties of neutral speech data

can help in subspace projection of neutral and stressed speech onto the speech subspace. In addition to this, the effectiveness of the proposed non-linear transformation is measured by comparing it with the linear transformation-based subspace projection approach. In the following Subsections, we have described the methodology of the non-linear transformation to develop the non-linear subspace for normalizing the stress information. This is followed by the learning mechanism of transformation matrix. A brief discussion on the low-rank subspace projection method is also presented.

3.4.1 Non-linear Transformation onto the Speech Subspace

The primary objective of modelling of non-linear subspace is to separate the speech- and the stress- specific attributes. The non-linear subspace is modeled by exploiting the non-linear transformation technique. The feature vectorxof speech signal (neutral or stressed speech) are projected onto the speech subspace through the non-linear transformation [142] as follows

y=Pf(x) (3.9)

where,yrepresents the non-linearly transformed feature vector corresponding to the neutral/ stressed speech utterances onto the speech subspace using the transformation matrixP. The non-linear transformation is derived using the polynomial functionf(.)[142] given as

f(x) =

pn=0

x^pⁿ (3.10)

The orderp_nof polynomial function is varied from 0 ≤p_n ≤ P_n. Using this polynomial function, the non-linearly transformed feature vectorycan be expressed as

y=Pf(x)

pn=0

x^pⁿ

=Pi+Px+Px²+Px³+· · ·+Px^Pⁿ (3.11) where, the vectoriis referred to as the identity vector. The subspace projection of feature vectorx through the polynomial function yields the features constituting the summation of bases of speech subspace Pi, the linearly transformed featuresPx, the linearly transformed features i.e., Px², . . ., Px^Pⁿ corresponding to the feature vector of higher order (x², . . ., x^Pⁿ) through the same transfor-

3.4 Non-linear Subspace Modeling Using Polynomial Function

Feature extraction

Features Neutral

speech GMM

training Mean vectors

Singular value decomposition

Transformation matrix

Figure 3.11: The proposed learning mechanism of transformation matrix.

mation matrix P. The non-linearly transformed feature vectors corresponding to the neutral and the stressed speech utterances are considered as the features having normalize stress information and constituting the speech-specific attributes. These non-linearly transformed features carry the similar acoustic characteristics and develop a non-linear subspace. As illustrated in Eq. (3.11), the efficacy of non-linear subspace modelling technique depends on the creation of transformation matrixP. In the following Subsection, we have described the learning mechanism of transformation matrix.

3.4.2 Learning Mechanism of Transformation Matrix

The proposed stress normalization method employs the subspace projection of neutral and stressed speech onto a common subspace using the non-linear transformation to reduce the acoustic mismatch between them. The common subspace, which comprises the speech-specific attributes, called as the speech subspace can assess the normalization of stress information with preserving the speech information. As discussed earlier, both the neutral and the stressed speech constitute the speech information. Therefore, the subspace projection of neutral and stressed speech through the transformation matrix, whose columns comprise the acoustic properties of neutral speech will help in projecting them onto the speech subspace. In order to address this, the transformation matrix is learned using neutral speech utterances. The learning mechanism incorporates the singular value decomposition (SVD) [142] on the mean parameter of Gaussian mixture model (GMM) [112] trained using the neutral speech as depicted in block diagram shown in Figure 3.11. The columns of transformation matrixPare considered as the bases of speech subspace and are estimated as follows.

Step I: A GMM, as the weighted sum of G-mixture Gaussian densities is trained using the features of neutral speech utterances (training database) belong to word w. Where w, (1 ≤ w≤W)is the index for the number of words present in the training database. The set of mean

vectorsM^w =

m^w1 m^w2 .... m^w_G

corresponding to Gaussian density for each word captures the acoustic properties of the neutral speech and are arranged in M =

M¹ M² .... M^W

. The size of matrixMbecomesK×W G, whereK is the dimension of feature vector.

Step II: In the next step, the SVD is performed over the matrix M to achieve the orthogonal characteristics as follows,

M=USV^′ (3.12)

where,UandVare orthonormal matrices having sizeK×KandW G×W G, respectively. Sis aK×W Gdiagonal matrix. The column vectors ofUspan the column space ofM[142]. They are a set of independent vectors since they are mutually orthogonal [142]. These orthonormal vectors inUare utilized as the columns of the transformation matrixP.

Using the estimated transformation matrix P, both the neutral and the stressed speech are projected onto the speech subspace through the non-linear transformation by exploiting the polynomial function as the non-linear function. The resulting non-linearly transformed features of neutral and stressed speech create the non-linear subspace and are considered as the features constituting the similar acoustic properties. In addition to this, the subspace projection is also derived using the linear transformation through the same transformation matrixP. In this work, the proposed stress normalization algorithm employing the non-linear subspace modelling approach is contrasted with the stress normalization technique accomplished using the linear transformation method.

3.4.3 Decorrelation of Feature- and Model-Space

The adaptation of feature- and model-space onto the another common subspace constituting the decorrelated characteristics and dimension lower than that of the default dimension is reported to be very effective in enhancing the robustness of the proposed stress normalization method using the linear subspace modelling technique as discussed in Section 3.3. The subspace projection of neutral and normalized stressed speech features onto the specific lower dimensional decorrelated subspace has resulted in the significant improvement in the performances of stressed speech recognition, when compared to the conventional cases over all the studied features and stress classes as summarized in Figure 3.5 and Figure 3.6. Moreover, the relative decrement in the KL divergence values between the Gaussian-subspaces, developed using the specific lower dimensional decorrelated features of

Dalam dokumen Speech subspace modelling with speaker adaptation for stress normalization (Halaman 118-123)