MUMLA Prediction with Experiment-III - Multi-users’ Multi-level Attention (MUMLA) on Images

4.6 Multi-users’ Multi-level Attention (MUMLA) on Images

4.6.3 MUMLA Prediction with Experiment-III

We demonstrate the performance of the proposed approach on open-source dataset [134]

consisting of Text-rich webpages, Mixed webpages, and Pictorial-rich webpages (see the sample webpages in Figure 3.6). The rest of the section describes the data-processing and preparation followed by multi-user multi-level attention characteristics and prediction according to the proposed approach.

Data Preparation, Description & Characteristics

Table 4.8: Data Characteristics of the Three Categories

Characteristic Text Mixed Pictorial

Number of Webpages 50 49 50

Number of Users 11 11 11

Number of unique fixations 1369 1866 2527

Median Attention-level 6 7 7

Label Cardinality,LCard(D) =¹_d

i=1

|L_i|at θ= 7 2.0228 1.947 2.1426 Label Density,LDens(D) = ^LCard(D)_θ atθ= 7 0.289 0.2781 0.3061 Label Diversity,LDiv(D) =|{L| ∃I: (I, L)∈ D}|at θ= 7 71 72 80 Proportion of Label Diversity, P LDiv(D) =^LDiv(D)_d atθ= 7 0.2023 0.1589 0.1541

The frequency distribution of FIsfor the three categories is shown in Figure 3.7. Evidently, web images on Pictorial-category webpages consistently received more fixations followed by Mixed-category and Text-categories. This is attributed to the presence of more number of image elements on Pictorial-category than its counterparts. However, across the categories, the frequency of FIsconsistently reduced with increasing FI indicating the reduction in salient image elements.

θ Selection: The medianFIof the three categories is 6,7,7 respectively for Text, Mixed, Pictorial categories. To segregate the prominentFIs from non-prominentFIs and to enable the comparison across categories, we consider maximum of the median fixation-indices, i.e., 7 as the representativemedian saliency threshold (median FI) θ. Accordingly, FIs1,2, . . . ,6 are prominent being the initial allocations and theFI 7 corresponds to non-prominent latter attention allocations.

MUMLA Data Characteristics: Considering each FI as a class label, the MUMLA data characteristics [13, 166] are summarized in Table 4.8. Label Cardinality— the average number ofFIsassociated with each image element; highlights the prominence of multi-label approaches. On an average, image elements received approximately two attention-levels across categories. Thus, it may not be good idea to apply the existing single-label assignment approaches as, two attention-levels may be similar but different allocations or diverse

4.6. MULTI-USERS’ MULTI-LEVEL ATTENTION (MUMLA) ON IMAGES

(a) Co-occurrence graph: (from left to right) Text, Mixed, Pictorial

(b) Multilabel data characteristics: (from left to right) Cardinality, Density, Diversity, Proportional Diversity

Figure 4.20: Data characteristics: (a) Co-occurrence graph of MUMLA at median FI thresholding. The bubble size indicates the frequency ofFIand the edge width indicates the co-occurring frequency of the connected fixation-indices. First twoFIs and medianFI (7) are highlighted for better interpretation; (b)Multilabel data characteristics with variation in sparse thresholdθ.

attention allocations. Analogously, theLabel Density— normalized Label Cardinality also followed the similar trend. TheLabel Diversity— distinct number ofFIsets (where, eachLiis an attention-level set) inD; further strengthened the necessity of multi-label approaches. At θ= 7, 2⁷−1 = 127 uniqueFIsets (each⊆ L) are possible out of which 55.91%, 56.69%, and 62.99% are part of Text, Mixed, and Pictorial categories respectively. Further, Text-category images have moreProportion of Label Diversity—normalizing the Label Density with number of instances, than the remaining two categories. The relatively lower LDiv value and the relatively higher P LDiv value of Text-category indicates, the respective category images are associated with moremulti-labelness though less number of image elements are present in this category. The counter discussion holds true for Pictorial-category.

To further understand the attention-levels’ co-occurring characteristic, the co-occurrence (one attention-level occurring along with the another inL) graph [34] is plotted at median saliency thresholding as shown in Figure 4.20a. The Text-category followed by Pictorial-category images received relatively more ‘similar but different’ and ‘diverse’ levels of attention as indicated by the thicker connecting lines among first, second and seventh attention-levels.

In contrary, the Mixed-category images received relatively more ‘diverse’ levels of attention allocations than ‘similar but different’ attention allocations as the line connecting first and

second attention-levels is relatively thinner.

MUMLA Data Characteristics with θ Variation: To understand the influence of θ selection onmulti-labelness, the four characteristics are computed with increasingθas shown in Figure 4.20b. The increase in Label Cardinality and the corresponding decrease in Label Density withθindicates the latter attention-levels are sparse and are co-occurring with initial attention-levels which also contributed to increase in Label Diversity and Proportion of Label Diversity. Overall, the Pictorial-category demonstrated relatively more multi-labelness than its counterparts.

Prediction Performance

Table 4.9: Prediction performance at median saliency-thresholding (θ= 7). Boldface: best performance; Underlined: second best performance; among the three categories.

Category Metric Predicted Baseline Outperformance (%)

Text

Hamming loss 0.1925 0.3172 64.78

Subset 0/1 loss 0.6286 0.9077 44.40

Accuracy 0.5476 0.3479 57.40

F1-score 0.6223 0.4548 36.83

PPV 0.8272 0.5052 63.74

TPR 0.5537 0.5470 1.22

Mixed

Hamming loss 0.1728 0.2866 65.86

Subset 0/1 loss 0.5546 0.8825 59.12

Accuracy 0.6057 0.3862 56.84

F1-score 0.6749 0.4958 36.12

PPV 0.8641 0.5455 58.41

TPR 0.6141 0.5966 2.93

Pictorial

Hamming loss 0.1895 0.3120 64.64

Subset 0/1 loss 0.5838 0.8856 51.70

Accuracy 0.6004 0.3878 54.82

F1-score 0.6792 0.5029 35.06

PPV 0.8912 0.5539 60.90

TPR 0.6101 0.6027 1.23

Without loss of generality, the MUMLAdataset, Din each category is randomly split into 80:20 train, test ratio. θ binary-classifiers are constructed using the training data while the test-data is utilized towards prediction performance computation. Accordingly, 5-fold cross-validationis performed with 10 iterations (to mitigate the possible occurrence of special structures during the random split). The performance metrics averaged across the iterations are obtained as overall performance.

The prediction performance at median saliency-thresholding is shown in FigureTable 4.9. To understand the quality of prediction, the performance is compared with Random prediction model [107]— randomly assign the FIsoverlooking the constituting visual features. In each category, all the metrics outperformed the random prediction metrics demonstrating the

4.6. MULTI-USERS’ MULTI-LEVEL ATTENTION (MUMLA) ON IMAGES

(a) Hamming Loss (b) Subset 0/1 Loss (c) Accuracy

(d) F1-score (e) PPV (f) TPR

Figure 4.21: MUMLA prediction performance with variation in saliency-thresholdθ

efficacy of the proposed approach and the considered image visual features. To note, the True Positive Rate (TPR) (also called, recall) of the baseline is comparable to predicted performance. However, the Positive Predictive Value (PPV; also, called precision) and F1-score of the baseline are significantly lower than the predicted values highlighting the bias associated with random prediction. Overall, the MUMLAprediction on web images from Mixed-category and Pictorial-category performed better than the Text-category. However, the relative performance with respect to the random prediction is better in Text and Mixed categories than Pictorial category.

Prediction Performance with θ Variation: To further analyze the influence of θon prediction, the performance metrics are computed for each increment inθ value for three categories. The progression of the performance is shown in Figure 4.21. Among all the metrics, the variation inθhas little influence on Hamming Loss which is consistently around 0.19 as shown in Figure 4.21a. Overall, the prediction performance for the Pictorial and Mixed categories are comparable and both are relatively better than the prediction for Text-category (see Figures 4.21c, 4.21d, 4.21e, 4.21f). For all the three categories, the prediction performance reduced with increasingθ indicating the influence of other factors (such as reduction in salient elements and possible influence of semantic features) which may not be explained by the considered visual features. However, the prediction performance consistently outperformed the baseline throughout the variation inθas shown inFigure 4.22.

(a) Text

(b) Mixed

Figure 4.22: Multilabel prediction performance comparison with baseline for three categories:

(from left to right) Hamming loss, Subset 0/1 loss, Accuracy, F1-score, PPV, and TPR.

4.6. MULTI-USERS’ MULTI-LEVEL ATTENTION (MUMLA) ON IMAGES

Table 4.10: Prediction performance at median fixation-index

Metric Our Model

(%)

Baseline (%)

Improvement (%)

Subset 0/1 loss 66.50 94.30 29.48 ↓

Hamming loss 23.62 41.08 42.50 ↓

Accuracy 67.49 47.24 42.87 ↑

Precision 84.14 66.30 26.91 ↑

Recall 78.91 71.92 09.72 ↑

F1-score 76.57 61.07 25.38 ↑

Dalam dokumen Computational Modeling of Free-viewing Attention on Multimodal Webpages - A Machine Learning Approach (Halaman 107-112)