Views of Attention and Grouping - Computational Modeling of Free-viewing Attention on Multimoda

1 2

3 4

1 2

3 2 3

2 1 3

4

webpage

user user

1

Figure 5.2: Example webpages (G1 andG2) and users (U1 andU2) considered for illustrating our proposed grouping criteria

5.1 Terminology

Consider, an interface consists of heterogeneous text and image elements. Each fixated element during the gaze session, aka data-of-interest (DOI), is associated withdata features, visual features, user features, and perceptual features. The data features describe the identity of theDOIthrough itsmodality and constitutinginterface. The visual features (described in Section 3.5) represent the characteristics of the rendered elements. The user features describe the identity of the user. Accordingly, the perceptual features describe the fixation-related characteristics including fixation-position,fixation-index— the ordinal number of a fixation in a scanpath, and the modality-fixation-index— the modality-specific relative fixation-index on the DOI. To note, though the fixation-index is from standard terminology, we introduced modality-fixation-index to capture the relative prominence of modalities within a scanpath.

For example, from Figure 5.2, the textDOIon the webpageG₁with fixation-index numbered 3 from userU2 received a modality-fixation-index of two as it is the second text-modality element to be fixated byU₂ on G₁. Similarly, the imageDOI element on webpage G₁ with fixation-index numbered 2 from userU2 received a modality-fixation-index of one as it is the first image-modality element to be fixated byU2 on G1.

5.2. VIEWS OF ATTENTION AND GROUPING

1 2

3 4

1 2

2 3

2 1 3

webpagewebpage

user user

(a) Webpage-oriented (WG)

3 4

1 2

2 3

2 1 3

webpagewebpage

user user

(b) User-oriented (UG)

1 2

3 4

1 2

2 3

2 1 3

4 1

webpagewebpage

user user

Figure 5.3: Pairings from each grouping for the attention shown inFigure 5.2.

corresponding to text and another corresponding to the image) of the same perceptual characteristic (analogous to views of the semantic concept in cross-modal information retrieval [32]). However, the perception is affected by two prominent factors namely, user idiosyncrasies and interface idiosyncrasies. Thus, we introduce three attention grouping strategies to constrain and marginalize their influence on user perception. Subsequently, text and image DOIswith asimilar attention drawing ability are paired towards unification.

Webpage-oriented Grouping (WG)

The interface’s idiosyncrasies influence the users’ attention allocation [16,164]. For example, users’ attention allocation on text-rich webpages differ from image-rich webpages [110]. Con- sequently, we propose to constrain the influence of interface idiosyncrasies while marginalizing

the user idiosyncrasies. That is,all users’ attention allocation oneach interfaceis considered a group. Within the group, each textDOIis paired with every imageDOIsharing the same fixation-index. The example pairs achieved using theWGstrategy are shown in Figure 5.3a.

On webpage G₁, the text element with FI=1 (achieved from userU₂) is paired to the image element with FI=1 (achieved from userU1) as both the elements were able to draw the same level of user attention (FI=1) on the given webpage,G₁. Similarly, text elements with FI=2 and 3 were respectively paired to the image elements with FI=2 and 3. The same approach applied on theG2 resulted in the pairs as shown in Figure 5.3a. To note, the work by [154]

represents theWG based pairing.

User-oriented Grouping (UG)

Analogous to the interface, the user idiosyncrasies also determine the attention allocation on interfaces [23, 72, 108]. Consequently, we propose to constrain the influence of user idiosyncrasies while marginalizing the interface idiosyncrasies. Each user’s attention allocation on all interfaces is considered a group. Within the group, each text DOIis paired with every imageDOIsharing the same fixation-index. The example pairs achieved using theUG strategy are shown in Figure 5.3b. For user U1, the text element with FI=1 (achieved on webpage G₂) is paired to the image element with FI=1 (achieved on webpageG₁) as both the elements were able to draw the same level of user attention (FI=1) for the given user, U1. Similarly, text elements with FI=2 and 3 were respectively paired to the image elements with FI=2 and 3. The same approach was applied for the user U₂ whose resulting pairs are shown in Figure 5.3b.

Webpage-and-User-oriented Grouping (WUG)

To account for both interface idiosyncrasies and user idiosyncrasies, we propose to constrain them simultaneously. That is,each user’s attention allocation oneach interface, i.e. scanpath, is considered as a group. However, unlikeWGandUG, no twoDOIswill be associated with the same fixation-index within a group. Thus, we utilize the modality-fixation-index for pairing the text and imageDOI. Within the group, pair the first fixated text with the first fixated image; second fixated text with the second fixated image; and so on. The example pairs achieved using the WUGstrategy are shown in Figure 5.3c. For userU1 attention on webpage G₁, the text element with FI=2 (modality-FI=1) is paired to the image element with FI=1 (modality-FI=1) as both the elements were quickest to draw the user attention among respective modality-specific web elements. Similarly, for userU1attention on webpage G₂, text element with FI=1 (modality-FI=1) was paired to the image elements with FI=2 (modality-FI=1). The remaining pairs obtained through the same approach were shown in

Figure 5.3c.

Dalam dokumen Computational Modeling of Free-viewing Attention on Multimodal Webpages - A Machine Learning Approach (Halaman 117-120)