Urn Model - Modeling and Predicting Object Attention in Natural Scenes

Chapter 1 Introduction

3.1 Urn Model

We model the naming of objects in an image with drawing marbles from an urn without replacement (see Figure 3.1). The urn contains one marble for each object category appearing in the image. The marbles are different sizes, affecting their probability of being chosen. Thus, a marble’s size represents the importance of the corresponding object. We represent multiple viewers by refilling the urn with the same set of marbles and sampling.

roadgrass carlicense plate doorsidewalk polehouse treeroof

carhouse street license plate poleporch tiresidewalk plant headlight

grass cartrees doors windows sidewalk street porch bicycle sign

carhouse doortree grass roadsidewalk patio tires

license plate

carhouse tirelicense plate headlight grass asphalt doorwindow antenna Viewer 1 Viewer 2 Viewer 3 Viewer 4 Viewer 5

patio license road tire tree

4th door

3rd house

2nd car

1st

grass 5th

Urn model Object lists

Figure 3.1: A photograph and corresponding lists generated by five viewers. Words are color coded to facilitate perception of word order. The urn models how humans name sequences of objects. An image contains many object categories with varied importance in that image. A viewer names objects one by one until ten are named.

Similarly, an urn is filled with marbles of different sizes, where larger marbles are more likely drawn. Ten marbles are removed from the urn, creating a sequence.

This model is based on several assumptions. First, the draws are independent;

this is reasonable because very few object pairs are dependent (Chapter 2.5). Second, everyone starts with the same urn; we don’t see clusters of different viewer behavior in our data, as we discuss in Chapter 3.6. Third, marbles can only be removed from the urn by being drawn. The third assumption is violated for some images. As discussed in Chapter 2.5, we find that obvious objects are named early or left unnamed. To model this we develop a variant of the urn model, which we call the forgetful urn. In this model, viewers draw marbles as before, but the first marble may go unreported with some probability.¹

Figure 3.2 shows importance measured through maximum likelihood (ML), maximizing the likelihood of observing our data with the importance values as parameters (Chapter 3.1). The forgetful urn and the urn produce similar estimates of importance when the most obvious object is not often overlooked, but the forgetful urn’s estimates are more realistic than the urn’s when the obvious object is frequently skipped.

One possibility is that certain objects are named earlier and more often because they are more frequent in human speech and hence easier to access. Figure 3.3 compares object importance and naming frequency in our experiments with lexical frequency from the British National Corpus [73]. There is no pattern showing that lexical frequency is responsible for the observed naming behavior.

In the urn model that we just described, the probabilities of being drawn are what we are trying to measure from the data. Previous work on this problem uses complex numerical methods [40] or requires many marbles of the same type (we have only one) [82]. Instead of using these approaches, we measure importance by maximizing the likelihood or probability of observing a set of sequences given the object importances π_i.

Each sequence consists of 10 marblesw_i^m, wherew^m_i denotes theith marble drawn in themth sequence and is a variable that takes values 1, ...N corresponding to object names. The w_i^m are drawn independently without replacement (out of N marbles,

1A rigorous definition of importance is the probability that a marble is drawn first, regardless of whether it is skipped.

0 0.5 1 0

5 10

frequency

order

Naming Statistics

sidewalkporchdoortreetire license platehousestreetgrasscar

0.030.03 0.030.03

0.030.040.060.080.09 0.56 Urn: ML Importance

treetire sidewalkporchdoor license platehousestreetgrasscar

0.020.02

0.020.020.020.050.060.070.11 0.58 Forgetful Urn: ML Importance

sidewalkporchdoortree license platehousestreetgrasstirecar

0.020.020.030.030.050.050.100.130.14 0.43 MC Importance

0 0.5 1

0 5 10

bushnotewall chair windowashtraycurtaintablelamptv

0.020.030.060.070.070.070.090.120.150.30

bushnotewall windowashtraycurtainchairlamptabletv

0.020.040.060.080.080.080.100.110.150.28

bushnotewall chair tabletv curtain window ashtraylamp

0.010.040.070.080.090.110.120.120.150.20

0 0.5 1

0 5 10

decktree swimsuitwomanrailingwatertablechairsteppool

0.020.030.050.060.090.090.120.160.170.19

decktree table swimsuitwomanrailingwaterchairsteppool

0.010.020.050.070.090.090.140.150.170.20

decktree table swimsuitwomanrailingwaterchairsteppool

0.020.030.060.070.070.100.130.150.180.19

0 0.5 1

0 5 10

steproofsky chimneyhouse sidewalkwindowgrassdoortree

0.040.04 0.040.090.100.100.120.120.140.19

roofsky chimneydoorstep sidewalkwindowhousegrasstree

0.030.030.060.080.080.090.110.140.170.19

steproofsky chimney sidewalkwindowhousegrassdoortree

0.040.050.060.080.110.110.110.130.140.16

0 0.5 1

0 5 10

bikeroof sidewalkwindowhousegrassdoortree chimneyfence

0.040.050.070.070.070.080.090.150.170.19

bikeroof sidewalkchimneywindowhousegrassfencedoortree

0.030.040.050.060.060.080.130.140.140.25

sidewalkchimneywindowhousegrassfencedoorbikerooftree 0.040.060.070.070.070.080.120.140.160.18

0 0.5 1

0 5 10

visorseat window sweaterjacketliquorice cigaretteglassman

0.020.02 0.030.030.040.060.100.110.200.35

sweaterwindowjacketliquorvisorseatice cigaretteglassman

0.010.010.020.030.030.050.050.080.250.43

sweaterwindowjacketliquorvisorseatice cigaretteglassman

0.020.030.040.050.050.070.090.120.190.32

Figure 3.2: Measured Importance. Scatter plot of frequency that an object appears on lists and mean order over lists for an image (2nd column). A comparison of the mean order and frequency an object (dot) shows that in some images the obvious object (red) is sometimes not named at all. This violates our urn model, but we can compensate for this behavior and see an improvement in importance measurement in these cases for the Forgetful Urn (4th column) over the Urn (3rd column). In the cases where the obvious object is not skipped the importance measurement is similar. The Markov chain (5th column) arrives at similar results through a different approach.

10¹ 10² 10³

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Lexical Frequency

Importance

10¹ 10² 10³

0 5 10 15 20 25

Lexical Frequency

Naming Frequency

Figure 3.3: A Scatter plot of object lexical frequency and importance. Each dot represents an object in a particular image. B An object’s lexical and within image naming frequency.

where N >> 10), so the probability of drawing a particular sequence of marbles (w₁^m, ...w^m₁₀) is

n=1

p(w_n^m|w_n−1^m , ...w^m₁ ). (3.1) However, we are drawing marbles without replacement, so this equation is con- strained by w^m_i = w_j^m =⇒ i = j. When we draw the nth marble of a sequence, n−1 marbles have already been removed from the urn, so we need to normalize the remaining importance to 1. The probability that the marble labeled w_n^m is the nth marble drawn is

p(w_n^m|w_n−1^m , ...w^m₁ ) =







0 if ∃i∈[1, n−1] :w_i^m =w^m_n

π_wm

1−Pn−1 i=1 π_wm

otherwise,

(3.2)

whereπ_i is the probability that marbleiis drawn first (from a fresh urn) andP

iπ_i = 1. The first case simply asserts that we are drawing marbles without replacement, so a marble cannot be drawn twice. If we assume that our data are valid then we are only concerned with the second case.

Dalam dokumen Modeling and Predicting Object Attention in Natural Scenes (Halaman 35-40)