• Tidak ada hasil yang ditemukan

Activation Operation: this module carries out the predictions base on the score generated from the feature map of each expression image. We use the ReLu activation function at each convolution layer of the network and the dense layer. We employed the sigmoid activation function in (3) for the final prediction at the fully connected layer and binary cross-entropy loss in (4) is used for the computation of the model loss.

S

= 1

e−S

(3)

L

(

y, y

) =

1

N

N

i=1

(

y·log

(

yi

)

(1

−y

)

·log

(1

−yi

)) (4) in (4) y is the actual class label while the

y

is the predicted label from the model. Max pooling Operation: this operation is conducted on the output of the convolution layer. The feature map is downsampled when it is convoluted with the kernel size used is 2

×

2 and with two strides.

Optimization Operation and the regularization: Stochastic gradient descent

(SGD) with a learning rate of 0.0001 is used as the model optimizer. Drop out

and data augmentation are the employed regularization techniques for the model.

a sequence data collected in a controlled environment. To make the CK+ con- formable to our proposed method we adopted the general annotation of Onset, peak and offset in a frame to categorise the data into Low, Normal and High as the degree of emotional intensity.

5.2 Data Pre-processing and Experiment Discussion

This work considered two major and important pre-processing techniques to improve the performance of the system. We used facial landmarking algorithm to detect the region of the face and also to remove background information which could subject the system to unnecessary computation. To make the system robust against over-fitting because of the small quantities of available data for CNN to learn, we employed data augmentation. Data augmentation was not used on the fly to ensure data balancing especially in CK+ dataset.

Fig. 3.This figure describes the hierarchical ordinal annotation of BU-3DFE, and pro- vides information of six basic emotion states and their respective intensity estimation using relative value.

The experiment evaluated the proposed ML-CNN model on both BU-3DFE and CK+ datasets. After preprocessing, each of the raw data was scaled to a uniform size of 96

×

96. The datasets were first partitioned into the training set (70%), the validation set (20%) and the remaining 10% for the testing set.

The training dataset was augmented and the pixel values were divided by 255 to ensure data scale normalization. We trained the proposed multi-label CNN network model on the training datasets. At each of the convolution layer of the network, we prevent over-fitting by using batch-normalization and dropout regu- larization techniques. We used Stochastic Gradient Descent with initial learning rate of 0.0003 for the network optimization. And Validation follows immediately

72

using validation dataset. We made some investigations in the experiments. We observed the performance of our model on the raw data without augmentation, we likewise checked for the performance when face localization and augmentation were applied and lastly, we observed how well the model was able to generalise to unseen data samples by training the system with BU-3DFE and validate with CK+ data.

For each of our investigation, we conducted the model evaluation with the data testsets. The choice of our model performance metrics are the multilabel performance evaluation metrics; the hamming loss, coverage, ranking loss, and cross entropy loss. Brief discussion of the metrics are given in the next section.

5.3 Evaluation Metrics

Hamming loss computes the loss generated between the binary string of the actual label and the binary string of the predicted label with XOR operation for every instance of the test data. The average over all the sample is taking as given in (5) below

H

= 1

|N|.|L|

|N|

i=1

|J|

j=1

XOR

(

yi,j,y

ˆ

i,j

) (5) Ranking Loss: this metric is used to compute the average of numbers where the labels are incorrectly ordered. The smaller the ranking loss within the closed range between 0 and 1, the better our model performance. (6) is the mathemat- ical illustration of ranking loss.

Rankloss

(

y,f

ˆ ) = 1

N

N1

1=0

||yi||0

1

k− ||yi||0|Z|

(6) where k is the number of labels and Z is (m, n): ˆf

i,m

ˆf

i,n

, y

i,m

= 1, y

i,n

= 0.

Average Precision: this metric is employed to find the function of higher ranked that are true for each ground truth label. The higher the value within the closed range between 0 and 1 the better the performance of the model.

Mathematical illustration of label ranking average precision is given in (7) below.

LRAP

= 1

N

N−1 i=0

1

||y||0

j:yi,j=1

|Li,j|

Ri, j

(7)

where L

i,j

= K: y

i,k

= 1, ˆf

i,k

ˆf

i,j

and

|.|

is the cardinality of the set. Cov- erage error: this metrics is used to evaluate on average the number of labels that should be included so that all the true labels would be predicted at the final prediction. The smaller its value the better the model performance. The mathematical illustration is shown in (8).

Coverage

(

y,f

ˆ ) = 1

N

N−1 i=0

maxranki,j

(8)

where rank

i,j

is —

{

k:ˆf

i,k

f

i,k

}

.

73

5.4 Result and Discussion

We present a summary of the result obtained from four categories of the exper- iments in Table

1. The first experiment evaluates the proposed method on BU-

3DFE without augmentation; the data contains 2400 samples. At the evaluation, we observed overfitting when the number of the epoch is 25, and as showed in Table

1. The coverage error 4.512 is high. The overfitting is traceable to insuffi-

cient data for the model to learn the representative features. In the other exper- iment that conducted augmentation on the training samples of BU-3DFE, no overfitting observed, and the values obtained for the metrics are promising. The result obtained from CK+ is relative to the BU-3DFE+Augmentation, which indicate the adaptability of the proposed method. The proposed method also generalised well when trained with BU-3DFE and tested with CK+, the value obtained for the hamming loss is 0.1668, the ranking loss is 0.1925, average precision is 0.7322, and the coverage error is 1.4810.

Table 1. The summary of the model performance evaluation using four multi-label metrics: indicates that the higher the value the better the model performance and indicates that the lower or smaller the value the better the performance.

Database Hamming loss Ranking loss Average precision Coverage

BU-3DFE 0.1521 0.4773 0.7353 4.512

BU-3DFE + AUG 0.0694 0.1911 0.8931 1.9457

CK+ 0.1426 0.0581 0.8993 1.6473

Generalization 0.1668 0.1925 0.7322 1.4910