Chapter 1 Introduction
3.1 Urn Model
We model the naming of objects in an image with drawing marbles from an urn without replacement (see Figure 3.1). The urn contains one marble for each object category appearing in the image. The marbles are different sizes, affecting their probability of being chosen. Thus, a marble’s size represents the importance of the corresponding object. We represent multiple viewers by refilling the urn with the same set of marbles and sampling.
roadgrass carlicense plate doorsidewalk polehouse treeroof
carhouse street license plate poleporch tiresidewalk plant headlight
grass cartrees doors windows sidewalk street porch bicycle sign
carhouse doortree grass roadsidewalk patio tires
license plate
carhouse tirelicense plate headlight grass asphalt doorwindow antenna Viewer 1 Viewer 2 Viewer 3 Viewer 4 Viewer 5
patio license road tire tree
4th door
3rd house
2nd car
1st
grass 5th
Urn model Object lists
Figure 3.1: A photograph and corresponding lists generated by five viewers. Words are color coded to facilitate perception of word order. The urn models how humans name sequences of objects. An image contains many object categories with varied importance in that image. A viewer names objects one by one until ten are named.
Similarly, an urn is filled with marbles of different sizes, where larger marbles are more likely drawn. Ten marbles are removed from the urn, creating a sequence.
This model is based on several assumptions. First, the draws are independent;
this is reasonable because very few object pairs are dependent (Chapter 2.5). Second, everyone starts with the same urn; we don’t see clusters of different viewer behavior in our data, as we discuss in Chapter 3.6. Third, marbles can only be removed from the urn by being drawn. The third assumption is violated for some images. As discussed in Chapter 2.5, we find that obvious objects are named early or left unnamed. To model this we develop a variant of the urn model, which we call the forgetful urn. In this model, viewers draw marbles as before, but the first marble may go unreported with some probability.1
Figure 3.2 shows importance measured through maximum likelihood (ML), maxi- mizing the likelihood of observing our data with the importance values as parameters (Chapter 3.1). The forgetful urn and the urn produce similar estimates of importance when the most obvious object is not often overlooked, but the forgetful urn’s estimates are more realistic than the urn’s when the obvious object is frequently skipped.
One possibility is that certain objects are named earlier and more often because they are more frequent in human speech and hence easier to access. Figure 3.3 compares object importance and naming frequency in our experiments with lexical frequency from the British National Corpus [73]. There is no pattern showing that lexical frequency is responsible for the observed naming behavior.
In the urn model that we just described, the probabilities of being drawn are what we are trying to measure from the data. Previous work on this problem uses complex numerical methods [40] or requires many marbles of the same type (we have only one) [82]. Instead of using these approaches, we measure importance by maximizing the likelihood or probability of observing a set of sequences given the object importances πi.
Each sequence consists of 10 marbleswim, wherewmi denotes theith marble drawn in themth sequence and is a variable that takes values 1, ...N corresponding to object names. The wim are drawn independently without replacement (out of N marbles,
1A rigorous definition of importance is the probability that a marble is drawn first, regardless of whether it is skipped.
0 0.5 1 0
5 10
frequency
order
Naming Statistics
sidewalkporchdoortreetire license platehousestreetgrasscar
0.030.03 0.030.03
0.030.040.060.080.09 0.56 Urn: ML Importance
treetire sidewalkporchdoor license platehousestreetgrasscar
0.020.02
0.020.020.020.050.060.070.11 0.58 Forgetful Urn: ML Importance
sidewalkporchdoortree license platehousestreetgrasstirecar
0.020.020.030.030.050.050.100.130.14 0.43 MC Importance
0 0.5 1
0 5 10
bushnotewall chair windowashtraycurtaintablelamptv
0.020.030.060.070.070.070.090.120.150.30
bushnotewall windowashtraycurtainchairlamptabletv
0.020.040.060.080.080.080.100.110.150.28
bushnotewall chair tabletv curtain window ashtraylamp
0.010.040.070.080.090.110.120.120.150.20
0 0.5 1
0 5 10
decktree swimsuitwomanrailingwatertablechairsteppool
0.020.030.050.060.090.090.120.160.170.19
decktree table swimsuitwomanrailingwaterchairsteppool
0.010.020.050.070.090.090.140.150.170.20
decktree table swimsuitwomanrailingwaterchairsteppool
0.020.030.060.070.070.100.130.150.180.19
0 0.5 1
0 5 10
steproofsky chimneyhouse sidewalkwindowgrassdoortree
0.040.04 0.040.090.100.100.120.120.140.19
roofsky chimneydoorstep sidewalkwindowhousegrasstree
0.030.030.060.080.080.090.110.140.170.19
steproofsky chimney sidewalkwindowhousegrassdoortree
0.040.050.060.080.110.110.110.130.140.16
0 0.5 1
0 5 10
bikeroof sidewalkwindowhousegrassdoortree chimneyfence
0.040.050.070.070.070.080.090.150.170.19
bikeroof sidewalkchimneywindowhousegrassfencedoortree
0.030.040.050.060.060.080.130.140.140.25
sidewalkchimneywindowhousegrassfencedoorbikerooftree 0.040.060.070.070.070.080.120.140.160.18
0 0.5 1
0 5 10
visorseat window sweaterjacketliquorice cigaretteglassman
0.020.02 0.030.030.040.060.100.110.200.35
sweaterwindowjacketliquorvisorseatice cigaretteglassman
0.010.010.020.030.030.050.050.080.250.43
sweaterwindowjacketliquorvisorseatice cigaretteglassman
0.020.030.040.050.050.070.090.120.190.32
Figure 3.2: Measured Importance. Scatter plot of frequency that an object appears on lists and mean order over lists for an image (2nd column). A comparison of the mean order and frequency an object (dot) shows that in some images the obvious object (red) is sometimes not named at all. This violates our urn model, but we can compensate for this behavior and see an improvement in importance measurement in these cases for the Forgetful Urn (4th column) over the Urn (3rd column). In the cases where the obvious object is not skipped the importance measurement is similar. The Markov chain (5th column) arrives at similar results through a different approach.
A
101 102 103
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Lexical Frequency
Importance
B
101 102 103
0 5 10 15 20 25
Lexical Frequency
Naming Frequency
Figure 3.3: A Scatter plot of object lexical frequency and importance. Each dot represents an object in a particular image. B An object’s lexical and within image naming frequency.
where N >> 10), so the probability of drawing a particular sequence of marbles (w1m, ...wm10) is
10
Y
n=1
p(wnm|wn−1m , ...wm1 ). (3.1) However, we are drawing marbles without replacement, so this equation is con- strained by wmi = wjm =⇒ i = j. When we draw the nth marble of a sequence, n−1 marbles have already been removed from the urn, so we need to normalize the remaining importance to 1. The probability that the marble labeled wnm is the nth marble drawn is
p(wnm|wn−1m , ...wm1 ) =
0 if ∃i∈[1, n−1] :wim =wmn
πwm
n
1−Pn−1 i=1 πwm
i
otherwise,
(3.2)
whereπi is the probability that marbleiis drawn first (from a fresh urn) andP
iπi = 1. The first case simply asserts that we are drawing marbles without replacement, so a marble cannot be drawn twice. If we assume that our data are valid then we are only concerned with the second case.