JOLRNAL OF S C I E N C E * TECHNOLOGV * No. 83B-2011
RELEVANCE FEEDBACK METHODS FOR SURVEILLANCE VIDEO RETRIEVAL PHUONG PHAP TiM KIEM DUA TREN KY THU.AT P H A N HOl CHO CAC VIDEO GIAM SAT
Le Thi Lan
Hanoi University of Science and Technology ABSTRACT
Object retrieval plays more and more important role as the number of video surveillance systems and the amount of stored data drastically increase. We address In this paper the specific part of retrieving objects of interest within surveillance video sequences problem: relevance feedback. In order to allow users to Interact with retrieval system, we propose two relevance feedback methods at object level. These methods take Into account appearance as well as temporal aspects of moving objects in surveillance video sequences. That feature is the main difference between our work and previous works. Experimental results on real surveillance video sequences captured in a metro station have proved the performance of two proposed methods.
TOM TAT
Tim kiim ddt twang trong video giim sit ngay cang trd nen cin thiit do sd Iwgng cac he thing giim sit cung nhw sd Iwgng dit lieu Iwu trie ngiy cing cao. Chiing tdi di cap din trong bii bio niy mgt bii toin cu thi trong dc he thing tim kiim ddl twgng trong cic video giim sit: tim kiim cd phin hdi vdi ngwdl sir dijng. Di cho phip ngwdl sir dung cd thi twang tic vdi he thing, chung tdi di xuit 2 phwong phip phin hdi d mirc ddi twgng. Cac phwong phip di xuit cho phep sir dung ca cic thdng tin vi diin mao cOng nhw vi thdi gian tin tal cua ddl twgng. Diy la diim manh ciia phwang phip di xuit so vdl dc phwang phip trwdc dd. Cic kit qui thir nghiem tren video thu nhin trwc tiip tai bin tiu di$n ngim chirng minh hiiu qui ciia cic phwang phip di xuit
1. INTRODUCTION
We address in this paper the problem of retrieving objects of interest within surveillance V ideo sequences. Nowadays, such solutions for content-based retrieval of surveillance data are ever more required as the number of v ideo suneillance systems and the amount of stored dala drastically increase. The main reasons for searching video surveillance data are forensic, i.e. lo look for video ev idence after an incident occurred. Figure 1 shows the position of surveillance indexing and retrieval in a complete video surveillance system. Videos coming from cameras will be interpreted by the video analvsis module. This module is divided into two levels: object level (consisting of objecl detection, object classification, object tracking in general) and event level (consisting of event recognition), fhere are two modes for using the output of analv sis module. In the first HKHIC. the eoriesponding alanns are sent to security staffs to infomi them about the siluation while in Ihc second one the analv zed results are stored in order to be used in the future. \ ideo indexing and retrieval relate to the
second mode. Corresponding two levels of video analysis module, surveillance video indexing and retrieval approaches can be grouped into two categories: objecl level category and event level one. The reader is invited to read the paper [I] for more information. We have previously proposed a general framework [2], [3] for video surveillance indexing and retrieval. This framework is based on the hypothesis that videos are partially indexed thanks to existing work in video analysis and video surveillance such as object tracking and event recognition.
The proposed framework allows to retrieve both objects and events of interest. However, relevance feedback is not considered in the prev ious version of this framework. In this paper, vve propose two new methods for surveillance indexing and retrieval at object level.
The rest of this paper is organized as follows: In section 2, we focus on analyzinu existing works for relevance feedback surveillance video indexing and retrieval.
Section 3 aims at describing the proposed
JOLRNAL OF SCIENCE & TECHNOLOGY • No. 8 3 B approach. Experimental results on real
surveillance video sequences are presented and analyzed in section 4. We conclude and discuss future vv ork in section 5.
v i d « o ar...lYSls
Figure 1. Surveillance video indexing and retrieval position in video suneillance system.
II. REUATED WORKS
The relevance feedback is a technique that allows the system to leam feedback from users and adapt to better meet the requests of users. While manv works have been proposed for information and image retrieval based on relevance feedback [4]. very few works have been done for surveillance video retrieval [1]
because of the difUculty in object spatial- temporal aspect modeling. Up to now, there are two main significant works have been presented. Meessen et al. [4] consider a relevance feedback method as a multiple instance problem, .\ecording to the authors, this solution is suitable for surveillance video because in surveillance video the present of several obiects in a frame might be required to define target surveillance events. The advantage of this work is that video suneillance indexing and retrieval module does not require much effort of video analysis module. The video analysis module needs to provide only object detection. However, this work does not take into account object temporal aspect and object tracking results \\ liile Meessen et al. [5] work with relevance feedback method at object level.
Chen et al. [6] introduce a method at event level The obiects are tracked and the correspcvnding trajeciories are modeled and recorded in the database. Several spatio- temporal event models such as single vehicle events are then consirucled. Bv using user
feedback for each v ideo sequence, the neural network KM lime series prediction is adapted to fit the specific needs of event identification.
The authors show their experiments on live transportation surveillance videos. In this work.
the video analv sis module has to support object detection, classification and tracking. The main drawback of this work is that object appearance is not considered. .As we explain, previous works for relevance feedback surveillance video retrieval do not take into account a special characteristic of moving objects in surveillance video sequences; variation of object appearance (appearance aspect) during a certain time (temporal aspect).
III. PROPOSED APPROACH
Before describing our proposed approaches, vve define here several definitions.
Definition T. Object region is a region determined by a minimal bo'mding box in a frame where object is detecte '
Objects in video surveillance are physical objects (e.g. people, vehicles) that are present in the scene at a certain time. In general, they are detected and tracked in a large number of frames. Consequently, an object is represented bv a set of object regions. Due to errors in object detection, using all these object regions for object indexing and retrieval is irrelevant.
Moreover, it is redundant because of the similar content between object regions. The object representation requires a method enabling to choose the most relevant and representative object regions. We call this method, the representative object region detection method.
In the previous work [3], we have proposed two representative object region detection methods based on Kmean and agglomerative clustering.
In this paper, we usc results of these methods to represent mov ing objects.
Definilion 2: Mov ing object representation; .A mov ing object O in surveillance video sequence is represented as
0 = {B .11 l.j = \....\ where Bj and Wj are the object region determined by the representative object region detection method and its weight.
The greater the weight is. the more important the role of objecl region is. With this representation, we can take into account both temporal and appearance aspect of objects because all possible appearance features such as
.joLKAALUI-SCIENCE* TECHNOLOGY • N o . 8 3 B - 2 0 I l
1_
M,
\o:\,i- N^i^j : retrieval results color, texture can be extracted in these object regions.
Defiinilion 3: Moving object matching is an operator that allows to compute the similarity between two given objects.
In [3], we have proposed a moving object matching method based on EMD (Earth Mover s Distance) and covariance matrix. In this work, we use this method for calculating object similarity.
Q = {6/|,( = I..A^„j,; set of objects in the database; R'
at t"' iteration; R" eQ: retrieval results at the O" iteration. These resuhs are retrieval results without relevance feedback. The results in R are sorted in descending order of their similarity wilh query object. In a retrieval system, in general, the system shows only the first M results in R": we use R'{M)to indicate the first M results in R'; O'': object judged by user as positive. Relevance feedback method performance is evaluated by two factors (1) The number of relevant objects in ./?'(.\/)must greater than that in R' '(M); (2) The number of iteration must be as small as possible.
We propose two relevance feedback methods: a method based on object representation and a method based on SVM.
The main idea of the first method is that vve can create a new quei-y object from a set of positive object regions determined by user. This query object can be a good representation of user vv ish. The main idea of the second method is we modify the similarity value ofobjects in the database and query. In other temi, the first method bases on query modification and the second one bases on similarity measure modification. The input and output of these methods are the same. The input can be an object or an image bv example and the output is the iteration where user decides to finish the interaction (t|-,„„i,) and the corresponding results R'M„KI,. Pseudo code of these methods is shown in Tab. 1. The main difference between two methcvds is in slops 3 and 4. W ith the method based on object representation, based on user judgment vve create an intermediate query
obj ect O, = {5 , If,}, / = 1.. .M^ with «; = — - where B, is the positive object regions and Mp is the number of positive results. In step 4, the system computes similarity between this intermediate object and the objects in the database. In the method based on SVM, in the step 3, from Mp positive object regions, vve create a training vector for one-class SVM by extracting covariance matrix feature [3] in each object region. Since after traing one-class SVM.
the probability of a new sample is detennined by /(.v) = sgn((u'.^(x))-p), the similarity of object Oj wilh query is then set to
f(0^) = w.^(0^) - p in step 4.
Table 1. Pseudo code of the proposed methods Method based on
object representation Begin
Step 1. do object
matching to obtain R"(M) vjhile cio
Step 2: ask user to judge results
<''={c;"}
step 3: modify object of query
Step 4: do object
matching to obtain R!(M)
until user decides to finish
End
Method based on SVM
Begin
Step 1: do object
matching to obtain R"(M) whi1e do
Step 2: ask user to judge results
R';-' = {o';-'}
Step 3: train SVM
Step 4:
recalculate object
similarity to obtain R'(M) until user decides to finish
End
IV. EXPERIMENTAL RESULTS AND DISCUSSIONS
We can evaluate performance of each relevance feedback approach. However, it is difficult compare the performance of different approaches even if they work on the same database and the same set of queries. Since retrieval methods based on relevance feedback
13
JOLRNAL OF SCIENCE i TECHNOLOGY • No.
interact vv ith the user thus their performance at each iteration depends heavilv on Uvo factors;
results and the number of resuhs judged by the User at this iteration. In some cases, vve can fix the number of judged results at each iteration.
However the first M resuhs of different approaches consist of different elements. From this analysis, vve analyze in this paper experimental results of our proposed. We adopt the evaluation measure: the .Average Nomialized Rank ( \NR). It is defined as follows;
A\R --
XX — i^R-' ^ " . , ( - V . - 1 ) where N,,.| is the number of relevant results for a panicular query. N is the size of the tested set.
and R, is the rank of the i'^ relevant results. The
\NR measure is in the range 0 (good retrieval) to 1 (bad retrieval).
For evaluating the proposed approach performance, two surveillance video sequences named C.ARI:_1 and CARE_2 coming from the C.\RI T.VKER (Content Analysis and REtrieval Technologies to -Vpplv Extraction to massive Recording) prqiect are employed. These videos depict human activ ity in a metro station. Thev are captured from the same scene by dilTerenl cameras. After applving \'SIP platfonn of Pl LS.\R team for analyzing video [2]. "~0 and 810 people are detected and tracked in C.ARI-_1 and CARE 2 respectively. In this experiment, vve take objeels of C A R E l as query objects of C \RE 2 as objects in the database. We propose a method for perfonning and evaluating relevance feedback automaticallv. Firstly, we fix the number of retumed results (in this paper
\ I is set to 100) and the maximum number iterations (this value is set to 5 in this paper).
For each iteration, the system chooses automatically relevant objects as positive results based on label information. A relevance feedback retrieval session will finish if it meets one of two criteria; (1) the number of relevant resuhs in the working iteration is smaller than that in the previous iteration (the retrieval qualitv docs ncvt improve); (21 the number of iterations is bigger than maximum number (the svstcni asks .1 lot of user feedback).
W<i-h
Figure 2. Obtained retrieval results without relevance feedback nhe first row), after one iteration (the second rmv) and after ni'o iterations (the third row)
Figure 3. .4XR obtained with the first method for 511 query objects. The vertical axis is .L\'R.
the horizontal axis is the ileralion.
Figure 2 illustrates retrieval results without relevance feedback, after one and two iterations. The rank of relevant results decreases after each iteration. Figure 3 shows ANR obtained at each iteration for 50 queries by using the first methode while Figure 4 shows the \NR obtained with the second one.
\ s we can see. in general the obtained .ANR decreases afer each iteration, this means that retrieval quality is improved. Our experimental results of two methods on the same dataset show that the method based on
•%/\t^ \jr JK.. l E N C E * TECHNOLOGY * No. 83B-2011
SVM outperforms the method based on object representation because the ANR of the former method decreases faster than that of the later one. However, the number of relevant results and therefore that of judged results at each iteration are sometimes small. This can cause sample lacking problem for training SVM.
Figure 4. ANR obtained with the second method for 50 ijiierv objects. The vertical axis is .4.\R while the horizontal axis indicates the ileralion
V. CONCLUSIONS
We proposed in this paper two relevance feedback methods at object level for surveillance video. As analyzed in section IV, results with relevance feedback are better than those without relevance feedback. However, the relevance feedback method presented in this paper is short term relevance feedback. The retrieval system forgets the knowledge leamed in each search session. This work can be extended by taking into account long-tenn relevance feedback.
Acknowledgments
The research leading to this paper was supported by the National Project DTDL.2009G/42 "Study, design and develop smart robots to exploit multimedia information" We would like to thank the project and people involved in this project.
REFERENCES
1. T-L. Le, A. Boucher, M, Thonnat, F. Bremond, "Surveillance video retrieval; what we have already done?", ICCE, Nha Trang, Ha Noi, 2010.
2. I'-L. Le, A. Boucher, M. Thonnat, F. Bremond, "Surveillance video indexing and retrieval using objet features and semantic events", IJPRAI, Special issue on Visual Analysis and Understanding for Surveillance Applications. Vol 23. No. 7. pp. 1-37.(2009)
3. T-L. Le, A. Boucher, M. Thonnat F. Bremond. "Appearance based retrieval for tracked objects in surveillance videos",. In Proceeding of the ACM intemational Conference on Image and Video Retrieval, Santorini. Eira, Greece, Julv 08-10, 2009. pp. 1-8.
4. J. Meessen, X. Desurmont. J. F. Delaigle. C. De VIeeschouwer and B. Macq, "Progressive Learning for Interactive Surveillance Scenes Retrieval" IEEE Intemational Workshop on Visual Surveillance (VS'07), 2007, pp. 1-8.
5. .X. Chen and C. Zhang, "An Interactive Semantic Video Mining and Retrieval Platform- Application in Transportation Surveillance N'ideo for Incident Detection" Sixth Intemational Conference on Data Mining (ICDM'06). Dec 2006. pp. 129-138
Author's address: Le Thi Lan-Tel (4-84)904.412.844 - Email: Thi-Lan.Le(gmica.edu.vn MICA, Hanoi University of Science and Technology
No. I. Dai Co Viet Sir., Hanoi, \ietnain.
15