Experimental results on publicly available databases show that the effectiveness of the proposed approaches for facial expression recognition applications increases. This chapter provides an overview of the problem of facial expression recognition, the applications of facial expression recognition, and the main challenges in this research.
Facial Expression Recognition
- Human-behaviour recognition
- Non-verbal communication
- Human-computer interaction (HCI)
- Security
- Medical
Therefore, it is essential to integrate non-manual parameters such as facial expressions to improve non-verbal communication. Modeling interpersonal behavior such as facial expressions can also be used to interact with computers [22].
General Paradigm of FER System
In such cases, a person's facial expressions and behavior are more important than verbal communication. This is due to the fact that not all areas of the face contribute to different facial expressions.
Facial Expression Generation System
Therefore, one important research challenge is to extract features only from the informative regions of a face. The changes in the limbic system ultimately change the appearance of a face by deforming facial muscle regions, and thereby different facial expressions are perceived.
Facial Expression Parametrization
Facial action coding system
Each of the AUs defines the movements of a specific region of a face with respect to their neutral position. For example, AU1 and AU2 represent "inner part of the eyebrows raised" and "outer part of the eyebrows raised" respectively.
Facial animation parameters
It should be noted that each AU is defined based on the muscle movements of a particular sub-region of the face relative to the state of that particular muscle of a neutral state. Therefore, most of the existing works use texture features which are extracted only from a set of localized facial points/regions of a face.
Multi-view/View-invariant FER
It is also observed that texture features can give better performance compared to geometric features [34]. However, texture features extracted from all areas of a face may not be discriminative, as all areas of a face may not include different facial expressions.
Major Challenges in Recognizing Facial Expressions
- Face localization/tracking
- Occlusion
- Feature extraction
- Recognition of non-basic expressions
In addition, as discussed in Section 1.2, another important aspect is the localization of informative face regions to extract the most distinctive facial features. In this method, the entire face is first divided into several sub-regions/sub-blocks, and then features are extracted from each of the sub-blocks.
Organization of the Thesis
This chapter provides a brief overview of state-of-the-art methods developed for facial expression recognition based on frontal faces and multi-view facial images. The current scenario of automatic facial expression recognition mainly focuses on recognizing facial expressions from multi-view/view-invariant facial images.
Overview of Different Methods of FER
Multi-view and/or view-invariant FER
Recognition of facial expressions from multi-view/view-invariant facial images is an important research direction of FER. Multi-view and/or view-invariant FER methods can be broadly classified into three categories based on how the underscore methods are implemented for expression recognition.
Facial Features Extraction Methods
Face-shape-free-based methods
- Global-based methods
- Local face-shape-free-based methods
Finally, a face image is represented by concatenated features extracted from each of the local regions. These methods initially divide a face image into a number of sub-blocks and then extract features from each of the sub-blocks for FER.
Face-shape-based methods
- Geometric and texture features
Similar to [92], few other attempts have been made in [91, 97], where authors have extracted features such as LBP, histogram of oriented gradient (HOG) [98] and SIFT at each of the landmarks. So one of the alternative ways is to represent a face with both geometric and external features.
Salient-region-based feature extraction methods
These methods show a promising result when features are extracted only from the salient regions of a face. It is observed that most of the common salient regions (green spots) are closer to the mouth, and so facial regions around the mouth capture maximum discriminative information for FER [41].
Review on Multi-view/View-invariant FER
View-wise/pose-wise multi-view FER
The general paradigm of the multi-view FER is shown in Figure 2.7. It mainly consists of three steps: 1) preprocessing an input face image, 2) feature extraction, and 3) recognition. One of the main limitations of multiview FER methods is that these methods do not take into account the correlations that exist between different views of facial expressions [33, 104].
View-normalization for multi-view FER
However, it has been noted by previous studies (Section 2.4.1) that the frontal view may not be the optimal view for expression recognition [91,97]. Also, distributing the weights in this way can directly affect multi-view feature synthesis, as some unwanted features may be added to the observation spaces.
Learning optimal canonical view for multi-view FER
DS-GPLVM is a nonlinear generative approach, which finds an optimal nonlinear discriminative space. DS-GPLVM is a state-of-the-art multi-view FER approach that outperforms existing linear and nonlinear learning-based methods [109-111].
Summary
Hence, a 1-NN (kNN with k = 1) classifier is used on learned shared space for multi-view FER. Generally, front view is considered a canonical view, which may not be an optimal view for multi-view FER.
Motivation of the Thesis
For view normalization, non-frontal face images are first transformed into a canonical view, and then a classifier is trained on this canonical space to recognize expressions from multi-view face images. Our proposed method for multi-view is the extension of this framework. i) Features extracted from the salient face regions significantly improve the performance of FER over the existing global and local-based feature extraction methods. Therefore, there is a need to develop a more versatile face model for extracting both geometric and textural features from a face. iii) Multi-view and/or view-invariant FER can be better discriminated in a common discriminative space [1, 109].
Objectives
Standard Databases Used for FER
Most of the existing FER methods extract features from all regions of a face and subsequently stack the features. One of the important issues in facial expression recognition is the extraction of discriminative features of a face. On the other hand, sequential concatenation of local feature vectors of the whole face image leads to inter-correlated features among different classes of expressions.
Proposed Framework
- Proposed informative region extraction (IRE) model
- Local Binary Pattern
- Projection analysis
- Extraction of a common reference image
- Weighted projection-based LBP (WPLBP)
- Expression specific sub-region analysis
- Feature selection and weight allocation
- Facial expression classification with reduced number of features 54
- Experiments on MUG database
- Experiments on JAFFE and CK+ databases
This consideration ensures that the number of subregions always remains 80, regardless of image size. The significance of a few selected subregions marked in Figure 3.6 [left] is shown on the right side of the same figure. The block size and 6×6 are deliberately chosen to keep the same number of subregions regardless of different image resolutions.
Conclusion
The face models available in the literature are also not suitable for extracting features from all the informative regions of a face. The disadvantages of existing face models are highlighted in this chapter, and subsequently informative regions extracted in the previous chapter are deployed to generate an efficient face model. Finally, we proposed a more efficient face model using informative regions of a face.
Face model-based FER
No standard analysis is available in the literature related to the selection of a suitable face model for FER. Moreover, no convincing analysis techniques are available in the literature to select the best possible face model. In view of this, we propose to extract a more generic face model using informative regions of a face.
Analysis of Existing Face Models
Proposed face model
As discussed in the previous chapter, the distribution of informative regions of a face can be estimated from the projection errors, and this distribution is illustrated in Figure 4.2. The importance of different informative regions is assessed by comparing the magnitude of projection errors with a threshold. In addition, our model has five non-informative regions that are important for the evaluation of the geometric features.
Feature Extraction
Geometrical features
With this in mind, the following geometric features are extracted using the proposed face model as shown in Figure 4.4. For example, e1 can be obtained by solving the equation of the lines connecting points 2, 5 and 3, 6 in the facial model. All geometric features f1 −f16 are normalized by distance d(1,7) to reduce the effect of shape variations between different objects.
Proposed texture feature extraction
Then, facial regions are separated into five groups based on the location of the landmark points. In Table 4.2, Ri represents the ide region, which consists of several sub-regions shown in the second column of Table 4.2. Finally, feature vector for each frame of a video is projected onto MDA basis vectors to obtain reduced discriminative features for each frame.
Experimental Results
Case study: (Recognition of gestures only with the help of facial expres-
- Sequence classification using Hidden Conditional Random Field 74
This is done according to changes in the latent states of the models. As a case study, the proposed face model is used to recognize some selected signs (sign language gestures) using only facial expressions. State-of-the-art methods extract a discriminant joint space for multi-view FER (MvFER).
Proposed Method
Proposed UMvDLPP
The elements of Ak (similarity matrix for view kth) can be obtained by applying the RBF kernel to the ith and th elements of the original data space Xk. In our proposed approach, instead of directly classifying by CCS, we first transform the features of the correlated common space Yccs into the Independent Common Space (UCS)Yucs, and then the classification is done. The test class label samplexktest is obtained based on the labels ofk-closest yktest samples in the common uncorrelated discriminant space.
Experimental Results
Finally, the fourth experiment presents a comparison of the proposed method with existing learning-based methods. In our first experiment, we analyzed how the classification accuracy depends on the dimensionality of the shared space. Experiment 2: In this experiment, we demonstrated within-class and between-class variations obtained from uncorrelated common space samples.
Conclusion
To validate our proposed method, four experiments are conducted and finally the average accuracy is used to compare the proposed method with existing approaches based on multi-view learning. In MvFER, the discriminant joint Gaussian process latent variable model (DS-GPLVM) gives better performance than the existing linear and nonlinear methods based on multi-view learning. To address all the above issues, we proposed the Uncorrelated Joint Discriminant Process Latent Variables Model (UDSGPLVM), and then it is extended to a hierarchical framework termed Multilevel UDSGPLVM (ML-UDSGPLVM) for FER with many views.
Proposed Methodology
Proposed ML-UDSGPLVM
- DS-GPLVM
- Effect of priors on Gaussian process
- Proposed ML-UDSGPLVM model
- Uncorrelated latent space
More specifically, it attempts to learn the covariance function k(yi,yj) of the shared manifold. The learning of the split manifold is achieved by minimizing the negative log-likelihood of the posterior distribution given in Eqn. Therefore, it maximizes the between-class separability (Sb) and minimizes the within-class separability (Sw) of the latent space.
Experimental Validation
Stage2 model evaluation using LBP + PCA + ML-UDSGPLVM features Model expression recognition rate (RR) (in. Stage2 model evaluation using LBP + LPP + ML-UDSGPLVM features Model expression recognition rate (RR) (in. The proposed ML-UDSGPLVM on LBP + LPP based feature gives better performance compared to DS-GPLVM.
Hierarchical-UMvDLPP vs ML-UDSGPLVM
The slight decrease in accuracy at the last/second level of the proposed method is due to miss. In Table 6.9, we compared our proposed method with state-of-the-art nonlinear based methods, i.e. D-GPLVM [154]. In summary, our proposed H-UMvDLPP based method outperforms the state-of-the-art linear and nonlinear multi-view facial expression recognition methods.
Conclusion
In summary, our proposed schemes (ML-UDSGPLVM and H-UMvDLPP) outperform the state-of-the-art linear and nonlinear-based multi-view learning techniques. This thesis described our proposed facial expression recognition algorithms for both frontal and multi-view face images. Uncorrelated multi-view discriminant locality preserving projection (UMvDLPP)-based approach is proposed to recognize expressions from a set of multi-view face images.
Future work
Next, a 2-UDSGPLVM is learned for each of the component expressions present in each of the categories. The performance of ML-UDSGPLVM is evaluated on the BU3DFE database and compared with state-of-the-art linear and nonlinear learning-based methods. Since our proposed face model is derived based on the movements of different areas of the face, and therefore, it can be used to recognize action units as defined by Ekman et al.[2].
Uncorrelated Discriminant Locality Preserving Projection Analysis (UDLPP)
General paradigm of facial expression recognition (FER) system
Emotion generation process
Generation of expressive face images from a neutral face image
Representation of few action units (AUs): AU1, AU2, AU4, and AU5 of a face
Localization of 84 landmark points on a sketched neutral face image defined in
Example of multi-view happy face images from BU3DFE dataset [37] (top) and
State-of-the-art techniques for facial feature extraction
Overall classification of existing facial features
Classification of multi-view/view-invariant methods
Facial Animation Parameters (FAPs) [31]
Salient regions highlighted by psycho-visual experiment presented in [13]
Specifications of different databases used in our experiments
Confusion matrix for 160 × 128 MUG dataset images
Confusion matrix for 120 × 96 MUG dataset images
Confusion matrix for 80 × 64 MUG dataset images
Confusion matrix for 60 × 48 MUG dataset images
Performance analysis of our proposed method on MUG, JAFFE, and CK+
Performance of the state-of-the-art methods
Geometrical features extracted from the proposed face model
Grouping of the sub-regions on the basis of the landmark points
Performance evaluation of different face models using texture features
Average accuracies in recognizing different sign language facial gestures
Confusion matrix showing gesture recognition rates. All the sequences of the
Confusion matrix for sign gesture recognition. All the sequences of the signs are