Projection analysis - Proposed informative region extraction (IRE) model

3.2 Proposed Framework

3.2.1 Proposed informative region extraction (IRE) model

3.2.1.2 Projection analysis

This is the basis of our proposed approach. Let X and Y be the two d-dimensional vectors as shown in Figure 3.3. Let the projection of vector X on Y is αY. Then, the objective is to find the best α which minimizes the projection error given by kX−αYk₂. The magnitude of error vector is minimum iff the error vector is perpendicular to the data vector i.e., the inner product between the error vector and the data vector must be zero. Mathematically, it can be shown as follows:

(X−αY)⊥Y⇔ h(X−αY),Yi= 0 (3.3)

⇒α=hX,Yi

hY,Yi (3.4)

where, h·,·idenotes the inner product. Thus, minimum error magnitude is given by Eqn. (3.5).

Minimum error magnitude =

X−hX,Yi hY,YiY

(3.5)

Let A = {Yi|i= 1,2, ..., N} and B =

X^k_ij|i= 1,2, ..., N;j = 1,2, ..., l;k = 1,2, ..., C be the set of neutral and expressive facial images respectively, where Y_i is the i^th neutral image and X^k_ij represents k^th class of j^th level of expressive image belonging to i^th subject in the training dataset. Each image in the training dataset is divided intoλ number of subregions. We define

the parameter λ as:

Number of sub−regions (λ) = Size of the image

Size of the block (3.6)

In our case, if the image size is P×Q, then the corresponding block size will be 0.1P×0.125Q.

This consideration ascertains that the number of sub-regions always remain 80 regardless of the size of an image.

Let us consider a particular facial expression belonging to class k, a particular subject i, and the corresponding reference image are available. Let x^ki_jm be the LBP features of m^th sub- region, m ∈ {1,2, ..., λ}, of j^th level of expressive image of i^th subject belonging to k^th class.

Also, yim is the LBP features ofm^th sub-region of i^th neutral image. As explained earlier, any expressive images are resulted due to the movements of different facial sub-regions of a neutral face image. Human facial muscles undergo changes for showing different facial expressions, and these changes are emerged as the variations of the texture patterns of different sub-regions of a face. The texture variations of different sub-regions with respect to the texture pattern of a neutral face image indirectly give an indication of corresponding facial expressions. So, there will be a change in distribution of LBP features between the blocks of an expressive face image and the corresponding blocks of a neutral face image.

e^ki_jm =

projyimx^ki_jm 2

x^ki_jm−

x^ki_jm,yim

hyim,yimiyim

(3.7)

Thus, m^thsub-region of an expressive image and the correspondingm^thsub-region of a reference image will have a projection error as shown in the Figure 3.3 [Right], and it is given by Eqn.

(3.7). Similarly, Eqn. (3.7) can be used to obtain the projection errors from all other blocks of an expressive image to the corresponding blocks of a reference image by varyingm = 1,2, ..., λ.

So, the projection errors are directly related to the importance of different sub-regions i.e., more is the projection error more is the importance of a facial sub-region, and vice-versa. In other words, a sub-region conveys more information if it gives more projection error. Apparently, this

analysis paves the way to extract the informative facial regions from a number of pre-defined sub-regions.

Let E be an error matrix whose elements of each row is e^ki_jm. In this, e^ki_jm represents the projection error of m^th block of an expressive image to the corresponding block of a neutral image belonging to k^th expression. Now, suppose that each raw vector e of error matrix E is drawn independently from a Gaussian distribution, whose mean is wp and covariance matrix is Σ. Then, the joint distribution of all the observations of error matrix can be written as the product of the marginal distributions, and hence it can be expressed as:

p(E|wp,Σ) =

n=1

p(en|wp,Σ) (3.8)

where, T represents the total number of samples used in this modeling, which can be obtained by multiplying C (number of class of expressions), l (number of levels per expression), and N (total number of subjects). Here, our main objective is to estimate the mean of the Gaussian distribution, which can be obtained by maximizing the log likelihood of Eqn. (3.8) with respect to mean vector w_p. The optimal likelihood solution for the mean vector is given as follows:

(wp)_opt = 1 T

n=1

en (3.9)

Let (wp)opt = h

w1 w2 · · · wλ

be the mean row vector, where each wm; ∀m ∈ [1, λ]

represents the average importance of m^th block in all the facial expressions. Average mean projection error is higher for a block which conveys more information for all the considered facial expressions. Thus, (wp)opt gives the distribution of importance of all the sub-regions. As stated earlier, different local muscular regions of a face undergo gradual changes for showing a facial expression. An expression generally starts from a neutral state, and after successive affine transformations of some facial sub-regions caused by muscular deformations, final peak level of an expression is obtained. So, all the different levels of expressions stating from mid to peak contribute to error distribution of the proposed model,i.e., all the levels of an expression have an influence on the error distribution. That is why, the modeling of informative regions

of a face only with the help of peak level expressions may not convey the actual information of the intermediate deformation of facial subregions. Hence, it is important to consider different levels of expressions in our proposed model.

Dalam dokumen SPEECH ENHANCEMENT (Halaman 72-75)