DANH GIA CACB^C
TRLTNGTRICH CHON CHO TIM KIEM DOI TUONG TRONG CAC HE THONG GIAM SAT THONG QUA CAMERA
Le Thi Lan^"\ Monique Thonnat"'\ Alain Boucher''^ TS. Francois Bremond^^^
(a)Triing tdm Nghien ciru qudc ti MICA, Tnrdng DHBK Hd Ngi
(b) Nhdm PULSAR, Hge viin qudc gia nghien ciru tin hge vd ty dgng hod (INRIA), Cgng hod Phdp
(c)Viin tin hoc Phdp ngu Hd Ngi (IFI), Hd Ngi, Viet Nam
Tom tdt:
Tim kiem doi tugng la mgt bdi todn quan trgng trong hi thdng gidm sdt thdng qua camera. Hai yeu to dnh huong din hi^u qud cua hi thdng tim kiim ddi tugng la dgc tnrng su dung vd chiin luge doi sdnh. Trong cdc bdi bdo trudc, chiing toi da trinh bdy phuong phdp doi sdnh dua tren dg do EMD, vd chi ra rdng do do ndy cho kit qua tdt lion cdc nghien ciru vd de xudt tnrac do. Trong bdi bdo ndy, chiing toi trinh bdy mot ddnh gid dinh lugng hiiu qud cua cdc dgc trung trong cdc he thong tim kiim video gidm sdt. Kit qud thu dugc cho phep dua ra mgt so ggi y de lua chgn dgc trung phu hgp cho ung dung.
Abstract:
Object retrieval plays an important role in surveillance system. Two main factors affect the perfonnance of object retrieval: the clioise of visual descriptor and matching strategy. In our previous work [6], [7], [8J, f9J we have proposed an object matching based on EMD (Earth Movers Distance) and have shown that this object matching outperforms the work in the state of the art. In this paper, we present a quantitative visual descriptor evaluation for surveillance video indexing and retrieval. The obtained results give several hints for choosing relevant descriptors for given application.
I. DAT VAN DE
Ciing vdi sy phat triln cua cdc thill bi sd, bdi todn tim kiem thdng tin ndi chung va tim kilm dnh, video ndi rien^ nhan da dugc rit nhieu sy quan tam trong nhiing nam vira qua.
Trong khi bdi loan tim kiem video cho cdc ITnh vyc video thi thao, video cdc bdn tin da dugc nghien ciiu, phat triln vd dat dugc nhirng kit qud budc diu [1], bdi loan tim kiem video chp cac he thing giam sat tir xa cdn mdi vd cd 2 dac dilm rieng nhu sau. Thii nhdt, ndi dung cua cdc video thi thao vd bdn tin da dugc bill tmdc, trong khi ngi dung video ciia cdc he thdng giam sdt tu xa dien ra bit ngd, khdng bdo tmdc vd theo thdi gmn thye. Thii hai, cdc video thi thao va bdn tin dugc chia tmdc thanh cac doan (segment), mdi doan dugc ddnh ddu bang sy chuyIn ddi doan (boundary transition), vd sy tim kilm thudng dimg d mire tim cdc doan ndi vl ciing mgt chu dk (vi du dam chay). Video trong cdc he-thing giam sdt tii xa khdng cd cdc ranh gidi vl doan, ndi dung tim kilm cung d mdC chi tilt horn : vi du tim mdt ngudi mac do hdng xuit hidn trong bin tdu tir thdi dilm tl den thdi dilm t2. Nhu vdy dk tim kilm video trong cdc he thing giam sdt tir, cdc video thu nhdn tii camera phdi dugc phan tich qua mgt md dun phan tfch video trudc khi dugc luu trir trong co sd dir lieu. Thdng thudng, phan tich video CO 4 budc CO bdn: phdt hidn ddi tugng (object detection), theo ddi ddi tugng (object tracking), phan loai ddi tijgng (object classification) vd nhan dang cac boat ddng, hanh vi cua ddi tugng (event recognition).
IL HIEN TRANG
Hien lai da cd mgi sd nghidn ciru va d(3 xudl cho bai toan lim kilm video trong cac h?
thing c>iam sal tir xa. Cd i h A l den he ihdng ciia IBM [2J. cdc dk xuat ciia Ma [3],[4], cac de xuat'cita Calderara [5]. He thing lim kicm ciia IBM la mgi he thing tuang ddi hoan chinh bao gom ca phdn phan tfch video va phin tim kicm video. Cac ddi lugng dugc lim kiem dya Iren mau sdc iron" khi cac su kien dugc tim kilm dua trdn : idn^sy kien, thdi gian cua sy kien, dii lugng (ldp ddi tugng) thue hi^n su kidn. Tuy nhien cdc doi lugng va sy kien dugc Um kiem dua tren cac lir khda ma khdng dya tren cac dac trun§ nhin thiy duqc cua ddi tugng. Hon nira doi vdi moi ddi tugng, he thong chi sir dgng mgt the hi?n duy nhat (the hien dugc phat hien diu lien) dk so sanh vdi ddi tugng (dnh) truy vin. Xuit phdt tir quan sdt do. Ma va cdc ddng nghiep [3], [4] da de xudt 1 phuang phap mdi cho danh chi so va tim kilm ddi tugng trong cdc video. Trong budc danh chi si, tir mgt t^p cac thd hien ciia doi lugng dugc phat hien trong lit ca cdc khung hinh, tac gid da tdch ra mgt tap cac thd hi?n lieu bieu thdng qua phuang phap nhdm cdc thi hien tuong ty nhau vl dde tiirng ma trdn hiep phuang sai. Sau d6 trong budc tim kilm, hai dii tugng (dugc bilu dien bdng 2 tap hgp thi hien lieu bilu) dugc so sanh vdi nhau thdng qua dg do Hausdoff tren cdc khodng each cua ma tran hiep phuang sai. Phuang phap ndy dd cdi thidn dugc ban ehl cua phuang phdp dl xudl bdi IBM. Tuy nhien dg do Hausdorff khdng phdi ludn ludn hgp ly. Trong [9] chiing tdi phdn tfch chi tilt vl vl dac dilm nay. Trong [5], Calderara vd cdc ding nghiep gidi thi?u gidi thudt so sdnh cdc doi tugng quan sdt voi nhidu camera. Dac tmng ciia cdc doi tugng dugc md td thdng qua cdc lap cac Gaussian (mixture Gaussian). Cdc Gaussian dugc khdi tao vd cap nhat gid tri bdng each su dung cdc tat cd cdc thi hien cua ddi tugng. Phuang phdp ndy luy sir dung dugc tit cd sy da dang vd bieu dien dugc sy quan trgng cua cdc thi hien cua dii lugng. Tuy nhien vdi each ndy cac Gaussian dugc quylt dinh bdng cdc Ihl hien cuoi cimg nlu the hidn cudi cimg khdc xa so voi Ihl hien ban ddu. Trong tmdng hgp nlu phuang phap phdt hidn ddi tugng va theo ddi doi tugng khdng chfnh xdc d cdc khung hinh cudi thi phucmg phdp nay khdng cdn chi'nh xdc va hieu qud nii'a.
Mac dil cdc gidi phdp de xudt da dat dugc nhirng ket qud budc dau, cdc he thdng ddnh chi sd va tim kiem video trong cdc he thdng gidm sat tir xa qua camera cdn tdn tai cdc van de sau.
9 \ 9 y 9 ^
Thii nhdt, cdc phuang phdp dugc de xudt deu dya tren gid thiet Id cdc giai thudt trong buoc phan tich video boat ddng tdt. Tat cd cdc ddi lugng, sy kien deu dugc phdt hien va theo doi vdi do chi'nh xdc cao. Tuy nhien, cdc ddnh gid (vf du trong dy dn ETISEO http://www- sop.inria.fr/orion/ETISEO/) da chi ra rdng cdc giai thuat chi hieu qua ddi vdi timg loai video (dieu kien dnh sdng, miirc tuang ldc giira cdc ddi lugng trong video). Thii hai, cdc giai thual de xuat deu dya tren viec sii dyng mgt d^c tnmg cd dinh, dugc lya chgn trudc. Nhu chung ldi da phan tfch d tren, hieu qua cua cdc dac trung phu thugc vao anh tmy vin vd cdc ddi tugn^
trong CO sd dir lieu. Trong phin 3 chiing tdi se trinh bay he thing danh chi so va tim kiem de xudt cho cdc he thdng gidm sdt tir xa qua camera. He thing ndy giai quylt dugc 2 vin de tren cua cac he thdng tim kiem hi?n tai.
III. GIAI P H A P
' • ' ' y 9
3.1 He thdng ddnh chi sd va tim kiem cac videos de xuat
Tir nhiing phan tfch d phdn hien trang, chung tdi da dl xuit mgt he thing ddnh chi s6 va tim kilm video cho cdc he thing giam sal tu xa qua camera [6],[7],[8],[9]. Hinh 1 bilu dien kiln true cua h? thing ddnh chi si va tim kilm video cho cac he thong giam sat tir xaqua camera dugc dk xuit. Trong phin ndy chung tdi chi md ta ting quan vl he thing. Cac md ta chi tilt hon vl he thing cd Ihl tim thiy d [6],[7],[8],[9]. Dilm khac biet cua he thing so v6i
cac he thdng trudc dd la phuong phdp lua chgn cac thd hien tidu bidu cho cac ddi tugng.
Phuong phap nay da dugc chu'ng minh la hieu qua han phuang phap duac dk xudt bdi Ma vii cac ddng nghiep [9] va do do HMD (Earth Mover's Distance)[8J dk so sanh cac ddi tirgna. Do do nay cho phep ban che nhugc didm ciia do do Hausdoff [3].
Anh v( dv
Hinh 1. Su do ciia he thdng danh chi si va tim kilm thong tin trong cac he thong giam sat tir xa qua camera
3.2 CAC DAC TRlTNG TRICH CHON TREN DOI TU'ONG
9 y 1 -9
Trong rat nhieu dac trung cd the trich chgn tren dnh cua cdc ddi tugng, chiing tdi lya chgn 4 loai dac trung de ddnh gid (mdu trdi, phdn bd dac tnmg bien, cac diem va vimg dac biet vd ma trdn hiep phuang sai) bdi vi cdc dac trung ndy the hien dugc hdu bet cdc dac diem cua anil. Trong khi mdu trgi vd ma trdn hiep phuang sai quan tam den mau sdc tren dnh, vd sy bidn ddi mdu sac thi phan bd dac trung bien quan tdm den bien cua cdc ddi tugng trong dnh.
Cac dac trung mdu trdi, phdn bd dac trung bien vd ma Iran hiep phuang sai id nhirng dac trung toan cue. Cac diem vd cdc vung dac biet Id cdc dac tnmg cue bd, phdn dnh dugc nhung diem quan trgng trong dnh. Do gidi ban do dai cua bdi bdo trong bdi ndy chiing tdi khdng trinh bay ky cdch trfch chgn cdc dac tnmg. Dgc gid co the tham khdo den [10] cho trich chgn mau trdi, [11][12] cho trich chgn dac tmng bien, 13], [14] cho cdc diem dac biet DoG (Difference of Gaussians), Hams va MSER (Maximally Stable Extremal Regions) vd [3] cho ma Iran hiep phuang sai.
IV. Thii- nghiem va danh gia kit qua 4.1 Co sd du' lieu va dg do danh gia
De danh gia hieu qud ciia giai phap dc xual, chung ldi sir dung video tir hai du an khoa hge ciia chau Au : du an CARETAKER vd dy dn CAVIAR. Dy an CARETAKER (Content Analysis and REtrieval Technologies to Apply Extraction lo massive Recording) co the truy cap til' dia chi http: //www.ist-caretaker.org/. Vdi dy an nay, cdc camera dugc lap ddt tai cac ga tau dien ngdm lai Y. Cdc video thu nhan lir camera dugc phan tfch ty dgng dya tren cac gidi thu^l vd phin mdm cua nhdm PULSAR (INRIA, Phap)[15]. Trong bai bao nay, chiing toi sir dgng mgt video cd dg ddi 7 phut tuong irng vdi 3965 frame. Tir video nay, cd 2311 d6i tugng (ngudi) dugc phat hien va theo doi. Dy dn CAVIAR cd the dugc truy nhap tai dia chi (http ://groups.inf.ed.ac.uk/vision/CAVIAR/CAVIARDATAl/). Muc dfch cua dy dn nay la nhdm chuin bj mgt co sd dir lieu chuin cho phdp ddnh gid cac giai thuat ve phan tfch video' cho bdi todn gidm sdt tir xa thdng qua video. Trong bdi bdo nay chung ldi sii dung 10 videos ciia CAVIAR thu nhdn tir cdc camera ddt tai cdc tmng tdm mua sdm d Tay Ban Nha. Tii 10 video ndy cd 53 dii tugng (ngudi) dugc phdt hi^n va theo ddi.
De ddnh gid hi?u qua, chiing tdi sii dgng dg do: trat ty tra ve tmng binh chuan hda (average normalized rank - ANR) [16]. ANR dugc xdc djnh nhu sau :
ANR = • (%, JKAmAlR)
IWI'IW,.,! t f 2
Ddi vdi mdi cau hdi tmy vin, |Nrel| Id so cau trd Idi phii hgp, |N| Id todn bg sd cau tra Idi vi Ri Id thii ty trd ve ciia ket qua phii hgp thii i. Dya vdo cdng thiic ta nhan thay rang, ANRco gid tri ndm giua 0 vd 1. Gid tri ciia ANR cdng nhd chiing td giai thuat hieu qua bdi vi giai thudt cho phep cdc cau trd ldi phu hgp dugc trd ve sdm (rank nhd). Trong cac phan sau chung tdi se phan ti'ch ket qud tim kiem thdng qua do do nay.
4.2 Cdc ket qua thu nghiem vd phan ti'ch a. Ddnh gid cdc logi dgc trung bien
Hjnh 2.a bieu dien ket qua (dang ANR) vdi 4 loai phan bi dac trung bien (phan bi d?c tnmg bien ban cue bg, phan bi dac trung bien todn cue, phan bi dac trung bien cue bg vd kit hiTpca 3 loai phdn bd) cho 23 dnh tmy vin tren 2311 ddi tugng video cua dy dn CARETAKER. Hieu qud ciia phdn bi ket hgp rat dn dinh bdi vi nd cd kha nang bu nhugc dilm cua phan bi bien cyc bg yd loan cue. Phan bi bdn eye bg d^i dugc kit qua tit trong phan Idn cac tmdng hgp.
Phan bd toan cue hieu qud ddi ydi thay dii vj tri cua dii tugng tren anh bao cua dii tupg.
Tuy nhien, nlu mgt phin Idn nln xuit hien hoae cdc ddi tugng ndy bi che khuit thi phan bo nay khdng cdn hi?u qua nira. Qua phan tfch ndy chiing tdi nh^n thiy 2 phan bd (ban cue bo va kit hgp) cd hifu qua gin nhu nhau tuy nhien kfch thudc cua phan bd kit hgp cd ki'ch thuoc Idn gap ddi phan bi bdn cgc bg.
t
Klian 1)6 «!ir. Inrnti bltn bitit cvc l>6 Phan b4 . U T Hi/nij 01*'! ro.*n r.ic Ph*n n4 djc iiimu l)i*« cue b6 KM ^vP ' H <J
. s.*'«ii.,'.-m.a
/ 'W 1
I' '
I 1 I I
^ I . I
I
I i
(a) (b)
Hinh 2. Ket qua tim kilm vdi 23 anh truy vin cua video tii dy an CARETAKER sii- dung phan bi (a) dac trung bien, (b) dac trung mau trpi va ma tran hi?p phuong sai
b. Ddnh gid tdt cd cdc dgc trung
Hinh 3 bieu dien kit qua tim kilm cho video cua 2 dy dn CAVIAR vd CARETAKER vdi cac dac tiimg: DoG va SIFT, mdu trdi, phdn b i dac tnrng bien vd ma Iran hiep phuong sai. Ta nhan thay rang trong hau hit cdc tmdng hgp, ma trdn hiep phuang sai dlu cho kit qud tit.
Hon nira mdt dieu thii vi Id khi ma Iran hiep phuang sai co kit qud khdng tit, cdc dac tnmg toan cue (mdu trdi va phan bd dac trung bien) lai cho kit qud khd cao. Tir nhdn xet nay ta cd the ket hgp ddng thdi dac trung ve mdu trdi hodc phan bd dac tnmg bidn vdi cdc diem vd vung dac bidt.
• . i
• • •
• . 4
•
•Y
\
K y\
'.?)
1
• \ 1
*'
"X
OaO*S«T
P
+ • / 53
I 1 _ ^ 1 _
•
i
-e-
J]
V
m
im I.
/ km u
" I]
A
I// \l\
IM | ( \ ^ f\lii i k\^ if li ?^ A // ll W
(a) (b)
Hinh 3. Kit qua tim kilm vdi (a) 23 anh truy van cua video tir dy an CARETAKER ya (b) voi 15 anh truy van ciia video tir dy an CAVIAR sir dung tit ca cac dac trung
V. KET LUAN VA
HU^6NG P H A TTRIEN
Nhu vay trong bdi bdo nay chiing tdi da: (1) tnnh bay gidi phdp de xudt cho he thdng ddrih chi si vd tim kilm video trong cdc he thong giam sdt tii xa; (2) phan ti'ch cdc dac frung tii'ch chgn cho cac dii tugng trong video; (3) phan tfch vd ddnh ^id hieu qud cua tiing dac tiung tren cdc co sd du lieu khac nhau. Tir cdc phan tfch dd ta nhdn thiy rdng trong khi ma tran hiep phuang sai sir dyng tit cd cdc dilm dnh tren dnh ciia ddi tugng, cdc dilm dac biet chi sir dyng thdng tin ciia mdt s i dilm anh. Cac mdu trdi va phan bo dac tnmg bidn cho phep xip xi thdng tin vl mau sdc vd bien ciia ddi
tugng. Mdl sy kel hgp giua (ma Iran hi?p phuang sai va mdu trgi) hodc giira (ma Iran hiep phuang sai va phan bo bien) hodc giiia ma Iran hiep phuong sai vd cac diem dac biet la mdt sy lya chgn hgp ly ndu ngudi sii' dung khdng dua ra dac tnmg mong mudn su' dung.
Loi cam on
Cdng tlinh tiinh bdy trong bai bao nay ndm trong khudn khd cua dk tdi Dgc lap cip nha nudc ma s6 42/2009G/HD-DTDL "Nghien cim, thidt kl, tfch hgp robot thong minh cd kha nang irng dung trong khai thac cdc thdng tin da phuang ti?n" Nhdm tdc gid xin chan thdnh cam on cac thanh vien trong dk tai da giiip dd trong qua Uinh thye hi?n cdng tiinh nghien cim ndy.
TAI LIEU THAM KHAO
[1] Zhou X.S. Tian Q. Rui R. Xiong Z. and T.S. Huang, Semantic Retrieval of Video [Review of research on video retrieval in meetings, movies and broadcast news, and sports]. IEEE Processing Magazine, March 2006.
[2] Arun Brown Hampapur, Lisa Peris, Rogerio Senior, Andrew Chiao-Fe Shu, Yingli Tian, Yun Zhai el Lu Max. Searching surveillance video. In IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS'07), pages 75-80, 5-7 Sept 2007.
[3] Isaac Cohen, Yunqian Ma and Ben Miller, Associating Moving Objects Across Non- overlapping Cameras: A Query-by-Example' Approach. In IEEE Conference on Technologies for Homeland Security (HST'08), pages 566-571, Waltham, MA, USA, 2008.
[4] Yunqian Ma, Ben Miller and Isaac Cohen, Video Sequence Querying Using Clustering of Objects' Appearance Models. In Intemational Symposium on Visual Computing (ISVC'07), 328-339, 2007.
[5] Simone Calderara, Rita Cucchiara and Andrea Prati. Multimedia surveillance: content- based retrieval with midticamera people tracking. In Proceedings of the 4th ACM intemational workshop on Video surveillance and sensor networks (VSSN'06), pages 95-100, New York, NY, USA, 2006.
[6] Thi-Lan Le, Alain Boucher, Monique Thonnat and Fran9ois Bremond, A Framework for Surveillance Video Indexing and retrieval, Sixth Intemational Workshop on
Content-Based Multimedia Indexing (CBMI), June 2008, London (UK), pages. 338 - 345,2008
[7] Le Thi Lan, Monique Thonnat, Alain Boucher and Francois Bremond, A Query Language Combining Object Features and Semantic Events for Surveillance Video Retrieval, The Intemational 14th MultiMedia Modeling Conference (MMM'08), Lecture Notes in Computer Science Volume 4903/2008, pages. 307-317, Kyoto, Japan, 2008.
[8] Thi-Lan Le, Alain Boucher, Monique Thonnat, Fran9ois Bremond, Surveillance video indexing and retrieval using objet features and semantic events, Intemational Journal of Pattem Recognition and Artificial Intelligence, Visual Analysis and Understanding for Surveillance Applications.
[9] Thi Lan Le, Monique Thonnat, Alain Boucher and Francois Bremond, /[ppearance based retrieval for tracked objects in surveillance videos, International conference on Image and Video Retrieval, Santorini, Greece, 8-10 July 2009.
[10] J. Annesley, J. Orwell and J.-P Renno, Evaluation of MPEG7, color descriptors for visual sun'eillance retrieval. In Proceedings of the 14th Intemational Conference on
Computer Communications and Networks, (ICCCN'05), pages 105-112, Washington, DC, USA, 2005.
[ll]Chee Sun Won, Dong Kwon Park and S.-J. Park. Efficient use of MPEG-7 edge histogreim de.scripten: ETRI Journal, vol. 24, pages 23-30, 2002.
[12] Chee Sun Won, Feature Extraction and Eveduation Using Edge Histogram Descriptor in MPEG-7. In Pacific-Rim Conference on Multimedia (PCM'04), pages 583-590, 2004.
[13] D. Lowe. Distinctive image features from sccde invariant keypoints. In International Journal Computer Vision (IJCV), vol. 60, no. 2, pages 91-110, 2004.
[14] J. Matas, 0 . Chum, M. Urban et T. Pajdla. Robust wide baseline stereo from maximally stable extremal regions. In British Machine Vision Conference (BMVC'62), pages 384-393, 2002.
[15]J.L. Patino, H. Benhadda, E. Corvee, F. Bremond and M. Thonnat, Extraction of activity patterns on large video recordings. In Computer Vision, lET, vol. 2, pages
108-128, 2008. 129
[16] H. Muller, S. Marchand-Maillet and T. Pun, The truth about corel evaluation in image retrieval. In Proc. Of. Intemational Conference in Image and Video Retrieval (CIVR'02), pages 28-49, London, United Kingdom, July 2002.