Prediction Performance - Weighted Voting based Attention Prediction on Images

4.5 Weighted Voting based Attention Prediction on Images

4.5.3 Prediction Performance

Number of Informative Features Vs. θ

To further understand the informative features, the number of informative visual features are computed with variation in θ as shown in Figure 4.13c. The number of informative features for uniform, linear, and inverse proportional weighting significantly decreased with an increase in θ. On the contrary, the number of informative features were relatively stable for proportional weighting up to θ = 9 after which it followed the reduction pattern of remaining weighting strategies. Beyond θ= 12, the number of informative visual features are approaching zero as other factors influence the attention which cannot be described by visual features.

Entropy Vs. θ

To understand the influence of θ on effectiveFI assignment (ground-truth), the entropy was computed for the four weighting strategies as shown in Figure 4.13d. At median θ, the linear weighting achieved the highest entropy among all strategies, and consistently remained till θ= 11. However, the increment in the entropy gradually reduced as the latter FIs are less frequent and were further weighted lower. The inverse proportional weighting achieved the second highest entropy till θ= 11 but outperformed the linear weighting beyond. The highest entropy achieved for the latter θis attributed to the higher weight assignment for the less frequent latter FIs. The uniform weighting and proportional weighting followed a similar entropy pattern, however, the latter weighting strategy achieved relatively lesser entropy owing to the FI’s frequency based weighting.

Observing the variation in number of informative visual features (Figure 4.13c) and the entropy (Figure 4.13d), we conclude that the distribution of effective fixation-indices varying significantly (higher entropy with increase inθ) which can not be explained by the visual features (lesser number of informative features with increase in θ). This also indicates, among the four strategies, proportional weighting strategy is a better candidate for modeling the effective visual attention with image visual features. However, to quantify and compare the prediction performance, metrics were computed for all the four strategies.

4.5. WEIGHTED VOTING BASED ATTENTION PREDICTION ON IMAGES Table 4.7: Prediction performance at median FI. Boldface indicates the best performance and underlined text indicates the second best performance.

Weighting Average Accuracy micro F1-score

Predicted Baseline ↑ (%) Predicted Baseline ↑(%)

Uniform 91.10 69.22 31.61 77.75 23.04 237.46

Linear 84.26 68.43 23.13 60.65 21.08 187.71

Proportional 86.52 67.25 28.65 66.30 18.14 265.49

Inverse Proportional 89.14 67.31 32.43 72.85 18.28 298.52

Towards the performance, the standard multi-class classification metrics, Average Accuracy and micro F1-score were computed.

The performance metrics of the four weighting strategies are shown in Table 4.7. The visual features performed the best at uniform weighted effective visual attention prediction, followed by Inverse proportional weighting, proportional weighting, and linear weighting. The best performance of uniform weighting indicates that all the FIs(up to median FI) are equally prominent and the visual features can computationally predict the corresponding effective visual attention. Additionally, it is supported by the relatively poorer prediction of linearly weighted (weight decreased with increase in FI) effective visual attention. Further support is provided by the second best performing inverse proportional weighting. The weights introduced in the Inverse proportional strategy are comparable to the uniform weights (as shown in Figure 4.11) which significantly varies from the linear weights. The prediction performance of the proportional weighted effective visual attention is more closer to the uniform weighted and inverse proportional weighted strategies than the linear weighted strategy.

50 60 70 80 90 100

5 6 7 8 9 10 11 12 13

Average Accuracy (%)

Saliency Threshold, θ Uniform Weighting

Linear Weighting Proportional Weighting Inverse Proportional Weighting

(a) Average accuracy

0 10 20 30 40 50 60 70 80

5 6 7 8 9 10 11 12 13

micro F1-score (%)

Saliency Threshold, θ Uniform Weighting

Linear Weighting Proportional Weighting Inverse Proportional Weighting

(b) micro F1 = micro Precision = micro Recall

Figure 4.14: Average Accuracy and micro F1-scores with variation in θ

65 70 75 80 85 90 95

5 6 7 8 9 10 11 12

Average Accuracy (%)

Saliency Threshold, θ Proposed

Baseline

(a) Uniform

65 70 75 80 85 90 95

5 6 7 8 9 10 11 12

Average Accuracy (%)

Saliency Threshold, θ Proposed

Baseline

(b) Linear

65 70 75 80 85 90 95

5 6 7 8 9 10 11 12 13

Average Accuracy (%)

Saliency Threshold, θ Proposed

Baseline

65 70 75 80 85 90 95

5 6 7 8 9 10 11 12 13

Average Accuracy (%)

Saliency Threshold, θ Proposed

Baseline

(d) Inverse Proportional

Figure 4.15: Average accuracies of four weighting strategies compared with the baseline for varyingθ

0 10 20 30 40 50 60 70 80

5 6 7 8 9 10 11 12

micro F1-score (%)

Saliency Threshold, θ Proposed

Baseline

(a) Uniform

0 10 20 30 40 50 60 70 80

5 6 7 8 9 10 11 12

micro F1-score (%)

Saliency Threshold, θ Proposed

Baseline

(b) Linear

0 10 20 30 40 50 60 70 80

5 6 7 8 9 10 11 12 13

micro F1-score (%)

Saliency Threshold, θ Proposed

Baseline

0 10 20 30 40 50 60 70 80

5 6 7 8 9 10 11 12 13

micro F1-score (%)

Saliency Threshold, θ Proposed

Baseline

(d) Inverse Proportional

Figure 4.16: Micro F1-scores of four weighting strategies compared with the baseline for varyingθ

Comparison with Baseline

The majority of the existing approaches are dichotomous (salient or not prediction) in nature and are limited in predicting the multi-level (FI) attention. As there is no baseline algorithm that exists for comparing with the proposed method, we follow the strategy proposed in [32,118] by employing random prediction model. That is, to demonstrate the efficacy of visual features in predicting the effective visual attention, the performance was compared with Random prediction (RP) model— randomly predicts a effective fixation-index from {1, . . . , θ}[107]. The RP model performance comparison with our proposed model is shown in Table 4.7. Evidently, all weightings outperformed the baseline across all metrics. The inverse proportional weighting outperformed the baseline with an improvement of 32.43%

in average accuracy and an improvement of 298.52% in micro F1-score. The second best performance in average accuracy improvement is 31.61% (for uniform weighting) and in micro F1-score improvement is 265.49% (for Proportional weighting).

Performance with variation in saliency threshold θ

To understand the influence of θ, performance was measured at θ varying further after median,θ= 5,6, . . . as shown in Figure 4.14. The average accuracy of the four weighting strategies consistently remained high with more than 84% where linear weighting exhibited relatively poorer accuracy. In contrary, the differences in prediction performance were lucid in the micro F1-score computation as shown in Figure 4.14b. The proportional

Dalam dokumen Computational Modeling of Free-viewing Attention on Multimodal Webpages - A Machine Learning Approach (Halaman 101-104)