Objectives and Contributions - Computational Modeling of Free-viewing Attention on Multimodal W

Thesis Work

Separate attention analysis

on text and images

Unification of text and image visual features

Attention prediction on

individual web elements

Attention prediction on

whole webpages

Position based attention

analysis

Visual feature based attention

analysis

Webpage oriented grouping

Webpage and User oriented grouping

Webpage oriented grouping

Webpage and User oriented grouping

Figure 1.2: Organization of the contributions

can be estimated as opposed to the region-based predictions in the state-of-the-art. To elaborate, the application of existing approaches may determine some regions of an element (say, text) as salient while the remaining as non-salient whereas web-designers assign the visual characteristics to the element as a whole. Additionally, the element-based attention predictions are useful for applications such as user preferential webpage rendering [88,115], and incorporation of dynamic (motion-related) characteristics.

1.3 Objectives and Contributions

The overall objective of the thesis is to analyze, establish, and predict the association between web elements’ visual features and the free-viewing user attention. Among the multiple possible modalities, only text and images are perceived through visual-alone inspection on webpages.

Accordingly, the formulated objectives and the corresponding contributions are as follows.

SeeFigure 1.2for the organization of the contributions.

I. Position-based and Modality-specific Attention Analyses: As the majority of the earlier attentional analyses are position-based, more fine-grained analyses are performed to answer the following research questions.

R1: How users allocate free-viewing ordinal attention on text and image elements positioned in 3×3 webpage regions?

R2: Which intrinsic visual features are informative in explaining the free-viewing ordinal attention on web elements?

R3: How the informative intrinsic visual features perform inpredicting the free-viewing ordinal visual attention?

The attention analyses on real-world webpages (described inChapter 4) resulted in following key findings.

1. Though users predominantly allocate the initial attention toMiddle andTop regions, the elements inRight and Bottomregions are not completely ignored.

2. The textual elements’Space andFont-Size determining intrinsic visual features, and image elements’ Mid-level Color Histogram intrinsic visual features are informative, whileposition andsize are informative for both the modalities.

3. The informative visual features predict the ordinal visual attention on an element with 90% average accuracy and 70% micro-F1 score.

4. The analyses concerning the image elements revealed that the image visual features outperform the random baseline in predicting the free-viewing user attention.

II. Unification of Text and Images: As text and images influence the user attention allocation on bi-modal webpages, a Canonical Correlation Analysis (CCA) based computational approach is presented to unify the cross-modal visual features of text and images. Through this analysis, we tried to answer the following research questions.

R1: Are the text and image visual features correlated based on the free-viewing user attention allocation on bi-modal webpages? Do the user idiosyncrasies and the interface idiosyncrasies affect such correlations?

R2: Which cross-modal visual features are comparable with each other based on the free-viewing user attention allocation on bi-modal webpages? Do the user idiosyncrasies and the interface idiosyncrasies affect such comparisons?

R3: Can the text visual features delineate the free-viewing user attention on image visual features and vice-versa for the bi-modal webpages? Do the user idiosyncrasies and interface idiosyncrasies affect such delineations?

TheCCA based computational approach (described inChapter 5) resulted in following findings.

1. Cross-modal text and image visual features are correlated when the interface idiosyncrasies, alone or along with user idiosyncrasies are constrained.

2. Thefont-families of text arecomparable tocolor histogram visual features of images in drawing the users’ attention.

1.3. OBJECTIVES AND CONTRIBUTIONS

3. Text visual features and image visual features can delineate each other’s free-viewing attention drawing ability.

III. Element Attention Prediction: The unification achieved through the Webpage-oriented Grouping (WG)— considers all users’ attention on each webpage separately, and Webpage-and-User-oriented Grouping (WUG)— considers each user’s attention on each webpage separately towards the unification; are utilized for element-granular attention prediction. Through this analysis, we tried to answer the following research questions concerning both the groupings.

R1: Can attention on elements be predicted if all the elements are unified into a text modality?

R2: Can attention on elements be predicted if all the elements are unified into an image modality?

R3: How well the achieved unification perform to predict the attention on unseen data for both the above research questions?

The multi-class classification based computational approach (described inChapter 6) resulted in following findings.

1. The element attention prediction outperforms the random baseline when all the elements are unified into the text modality.

2. The element attention prediction outperforms the random baseline when all the elements are unified into the image modality.

3. For both, WG and WUG, the element attention prediction outperforms the random baseline for unseen webpage data, while both achieved comparable predictive performance.

IV. Scanpath Prediction The unification achieved through theWG, andWUG, are utilized to predict the prominent scanpath on webpages. For this, the element attention prediction model is extended to incorporate the users’ positional-bias. The following research questions are investigated concerning both the groupings.

R1: Can scanpath be predicted if all the elements are unified using WG?

R2: Can scanpath be predicted if all the elements are unified using WUG?

R3: How well the achieved unification perform to predict the scanpath on unseen data for both the above research questions?

Following are the scanpath prediction findings (described in Chapter 7).

1. The scanpath attention prediction outperforms the random baseline for WG 2. The scanpath attention prediction outperforms the random baseline for WUG

3. For both,WGandWUG, the scanpath prediction outperforms the random baseline for unseen webpage data, while both achieved comparable predictive performance.

Dalam dokumen Computational Modeling of Free-viewing Attention on Multimodal Webpages - A Machine Learning Approach (Halaman 36-39)