Object Tracking in Video

(1)

Object Tracking in W11

Video

1

Ref: http://www.robots.ox.ac.uk/~misard/condensation.html

(2)

Video Analysis

•  Detection of interesting objects

•  Tracking such object from frame to frame

•  Analyzing the objects tracks

(3)

Tracking Methods

•   Bottom-up

–  Segmentation-based

•  Top-down

–  Model-based

3

(4)

Tracking a Leaf

(5)

Girl Dancing Vigorously

http://www.robots.ox.ac.uk/~misard/condensation.html

5

(6)

Tracking Finger of Violin Player

(7)

Robust Density Comparison for Visual Tracking (BMVC 2009)

Tracking in Difficult Scenarios

7

(8)

Can you use object

detection for tracking?

(9)

Additional Difficulties in Object Tracking

•  Image noise/clutter

•   Illumination changes

•   Complex object motion

•  Partial and full occlusion

9

(10)

Main Steps

•  Object Representation

•  Feature Selection

•  Object Description

•  Correspondence across frames

(11)

Image: Yilmaz, A., Javed, O., and Shah, M. 2006. Object tracking: A survey. ACM Comput. Surv. 38, 4, Article 13 (Dec. 2006), 45 pages. DOI = 10.1145/1177352.1177355 http://doi.acm.org/10.1145/1177352.1177355

Object Representation

4 A. Yilmaz et al.

Fig. 1. Object representations. (a) Centroid, (b) multiple points, (c) rectangular patch, (d) elliptical patch, (e) part-based multiple patches, (f) object skeleton, (g) complete object contour, (h) control points on object contour, (i) object silhouette.

—Articulated shape models. Articulated objects are composed of body parts that are held together with joints. For example, the human body is an articulated object with torso, legs, hands, head, and feet connected by joints. The relationship between the parts are governed by kinematic motion models, for example, joint angle, etc. In order to represent an articulated object, one can model the constituent parts using cylinders or ellipses as shown in Figure 1(e).

—Skeletal models. Object skeleton can be extracted by applying medial axis trans- form to the object silhouette [Ballard and Brown 1982, Chap. 8]. This model is com- monly used as a shape representation for recognizing objects [Ali and Aggarwal 2001].

Skeleton representation can be used to model both articulated and rigid objects (see Figure 1(f).

There are a number of ways to represent the appearance features of objects. Note that shape representations can also be combined with the appearance representations [Cootes et al. 2001] for tracking. Some common appearance representations in the context of object tracking are:

—Probability densities of object appearance. The probability density estimates of the object appearance can either be parametric, such as Gaussian [Zhu and Yuille 1996]

and a mixture of Gaussians [Paragios and Deriche 2002], or nonparametric, such as Parzen windows [Elgammal et al. 2002] and histograms [Comaniciu et al. 2003]. The probability densities of object appearance features (color, texture) can be computed from the image regions specified by the shape models (interior region of an ellipse or a contour).

—Templates. Templates are formed using simple geometric shapes or silhouettes [Fieguth and Terzopoulos 1997]. An advantage of a template is that it carries both spatial and appearance information. Templates, however, only encode the object appearance generated from a single view. Thus, they are only suitable for tracking objects whose poses do not vary considerably during the course of tracking.

ACM Computing Surveys, Vol. 38, No. 4, Article 13, Publication date: December 2006.

11

(12)

Feature Selection

•  Color

•  Edges

•  HoG

•   And more…

(13)

Can we use HoG to build object template?

1.   HoG is for a class of objects not a specific object

2.   Colors are more specific to the objects

13

(14)

Object Description

1.   Object is specified in the first frame of video (e.g. rectangle)

2.   Object is detected in video (e.g.

humans)

3.   Obtain a template of the object (e.g.

color histogram)

(15)

Objective: Given object

location in the fist frame, find object locations in the

subsequent frames!

15

(16)

Correspondence across frames

•   Kalman filter

•   Mean-shift filter

•   Particle filter

(17)

Particle Filters

•  Also known as Condensation Algorithm

•  A kind of genetic algorithm

17

(18)

Genetic algorithms follow

“survival of the fittest”

•  Individuals compete for resources and mates

•  The strong ones will succeed and produce more offspring

•  The good “genes” propagate from parents to the child

•  Over time, generations become more

(19)

Particles are position and shape

Example: (X, Y, Width, Length)

19

(20)

Particle filter has three main phases:

1.   Sample

2.   Predict

3.   Update

(21)

Sample step

a new set of particles is chosen so that each particle survives in

proportion to its weight

21

(22)

p(x | Z )_t _t

t−1 t−1

p(x | Z )

measure p(x | Z )_t _t−1

t

s(n)

t t

p(z | x )

predict

t−1 t−1π s ,(i) (i)

(23)

Predict step

take each particle and add a random sample from the motion model

23

(24)

Sample-set

representation

(25)

Update step

calculate weight of each particle as probability of observing that

particle given the template

25

(26)

Particles with Weights

(27)

Current State

(Location and Shape)

1.   Highest weight particle

2.   Weighted sum of particles

27

(28)

Current State

(29)

Task: Use Particle Filter method to track middle

finger of violin player!

29

(30)

Particles

) ,

, ,

,

( L W X Y

S = θ

Y X = = θ = =

W L = Length of rectangle Width

Angle

Row Coordinate

Column Coordinate

(31)

Initialization

•  Initialize all the samples ( particles) by actual location and shape of the finger

•  Assign weight π t n = 1 to all particles

•  Obtain color histogram of the finger

31

(32)

Re-sampling

–  Select 100 samples from the set of samples with probability

•  Normalize all the weights by dividing each weight sum of weights

•  Calculate the cumulative mass function from sample weights

S

_t^'₋₁

π

_tⁿ

c

_t⁰

= 0

c

_tⁿ

= c

_tⁿ⁻¹

+ π

_t^'ⁿ

π

_t^'ⁿ

= π

_tⁿ

π

_tⁿ

i=1

∑

100

1

r

Sampling Step in t th Frame

(33)

Prediction Step in t th Frame

•  Propagate each sample by adding Gaussian Noise

S t ' − n 1 = S t n − 1 + W t − n 1

33

(34)

Update by Observing Current Pixels

•  Assign weights to the samples based on histogram matching

•  Calculate Bhattacharya coefficient and weight

2 1

∑ .

= h

_itk

h

_imk

bc

(35)

Current State

•  Estimate the current state by taking weighted average of the particles

∑ =

= 100 1 ]

[ S it n it n s it n

E π

35

(36)

One Finger

(37)

Not so good

37

(38)

Four Fingers

(39)

Particle Filter Basics

•  The basic idea of particle filters is that any pdf can be represented as a set of

samples (particles).

•  The density of your samples in one area of the state space represents the probability of that region.

•  This method can represent any arbitrary distribution, making it good for non-

Gaussian, multi-modal pdfs.

39

Object Tracking in Video

Object Tracking in W11

Tracking a Leaf

• Image noise/clutter

Object Representation

Feature Selection

2. Object is detected in video (e.g.

• Also known as Condensation Algorithm

• Over time, generations become more

take each particle and add a random sample from the motion model

finger of violin player!

• Normalize all the weights by dividing each weight sum of weights

• Calculate Bhattacharya coefficient and weight

• The density of your samples in one area of the state space represents the probability of that region.

•  Image noise/clutter

2.   Object is detected in video (e.g.

•  Also known as Condensation Algorithm

•  Over time, generations become more

•  Normalize all the weights by dividing each weight sum of weights

•  Calculate Bhattacharya coefficient and weight

•  The density of your samples in one area of the state space represents the probability of that region.