CHAPTER 3. RESULTS
3.2.4. Experimental results
The videos in 1st group are performed experiment many times and select the best result. Table 3.3 shows the comparison experimental results of 2 approachs these videos. In the case of using the proposed method, the average value of precision of the four videos is 80%, the average of recall is 84%, and the F1 score is 81.9878. If using Poppe’s method, average of precision is 81%, average of recall is 83%, and the F1 score is 81.95122. We can see that the performance of our method is equivalent to that of Poppe’s method when applying on a low-resolution video.
Table 3.3. The performance of two approachs with Pedestrians, PETS2006, Highway, and Office
Video
Our approach Poppe’s approach
Precision (%)
Recall
(%) F1 Precision (%)
Recall
(%) F1
pedestrians 84 95 89.16201 80 90 84.70588
PETS2006 87 80 83.35329 88 78 82.6988
Highway 77 81 78.94937 78 80 78.98734
Office 72 82 76.67532 75 83 78.79747
Average 80 84 81.95122 81 83 81.9878
With the 2nd video group, the high resolution videos, the proposoed method is used to perform experiment many times with different Ts parameters and selected 4 best results. Table 3.4 is the experimental result when using Poppe’s approach and Table 3.5 is the experimental result of the proposed method on these videos.
The results show that the recall values of Poppe’s approach are usually smaller than the values of proposed method, meaning the number of missing moving objects detected by Poppe’s approach greater than the proposed method. This happen because there are many “skip_mode” MBs in a frame of a high resolution video.
Table 3.4. The experimental result of Poppe’s approach on 2nd group Video Precision Recall F1
HMI_WetRoad 0.4954 0.8943 0.6376 HMI_OutDoor 0.5145 0.7711 0.6172 GVO2_0308 0.6821 0.6016 0.6393 NEM1_131 0.6055 0.7602 0.6741 DNG8_1708 0.8777 0.7489 0.8082 CuaHang_01 0.7468 0.8339 0.788 TrongNha_02 0.8341 0.7247 0.7756
In additional, the experimental results in Table 4.5 show that the videos which have good results are the videos have less noise, and there is a clear distinction between the background and moving objects. And, the results do not depend on videos capture from outdoor or indoor cameras. As in the results table, the best result is the TrongNha_02 video (Fig. Figure 3.3a) with F1 score = 0.8771. This is a video obtained in a working room (namely a police station). Good environmental conditions with low noise. A moving object is a person who clearly
distinguishes the floor. The shirt of a moving object has only one color but is not uniform due to many wrinkles.
The worst video is NEM1_131 (Fig. Figure 3.3d) with F1 score = 0.6235.
Although this video is recorded indoors, it has an outward-facing view. And the entrance of the room is made of glass, easy to reflect the moving objects. The video is recorded in the evening, so the light outside the room is easy to make noise.
Table 3.5. The experimental result of proposed method on 2nd group Video Ts Precision Recall F1
HMI_WetRoad
90 0.7409 0.8644 0.7979 100 0.734 0.8935 0.8059 110 0.736 0.8197 0.7756 120 0.7461 0.9453 0.834
HMI_OutDoor
70 0.6916 0.8681 0.7699
80 0.641 0.8656 0.7366
90 0.7055 0.8962 0.7895 100 0.7195 0.9151 0.8056
GVO2_0308
70 0.5926 0.8018 0.6815
80 0.577 0.8653 0.6923
90 0.5376 0.836 0.6543
100 0.5821 0.916 0.7118
NEM1_131
90 0.4762 0.8183 0.602 100 0.4655 0.9333 0.6211 110 0.4847 0.8737 0.6235 120 0.4855 0.8702 0.6233 DNG8_1708
60 0.7612 0.8164 0.7878 65 0.7889 0.9217 0.8501 70 0.7843 0.9157 0.8449
75 0.777 0.8789 0.8248
CuaHang_01
75 0.7498 0.8796 0.8095 80 0.7676 0.9302 0.8411 85 0.7372 0.8598 0.7938 90 0.6828 0.9339 0.7889 TrongNha_02
50 0.8283 0.9319 0.8771 55 0.8139 0.9095 0.859 60 0.8248 0.9261 0.8725 65 0.8254 0.9247 0.8722
The experimental results also show that the choice of the threshold Ts is quite difficult. This is also a limitation of the proposed method. Normally, the video is
less noise, the threshold value Ts will be less than the Ts of the video has more noise.
Under the system conditions described above, the processing speed is between 17 and 23 fps. If you install the program on a Raspberry Pi2 device, the processing speed is between 22 and 27 fps depending on the amount of motion in each frame of the video. This speed fully meets the real-time requirements of the problem.
Chapter Summarization
This chapter presents the experiment results of thesis. The dataset of experiments are taken from database of Change Detection Workshop 2014 and more than 100 actual surveillance cameras installed in Hanoi City and Da Nang City which are provided by VP9 including indoor data and outdoor sences. These videos are captured without scripting and prior arrangement. The results show that the proposed method can determine accurately moving objects in the benchmark videos of Change Detection Workshop 2014. In addition, with high-resolution videos, the proposed method can perform in real-time better than the related works. This may due to the appearance of many “skip_mode” MBs in a frame of a high resolution video. The proposed method has been also used to build a moving object detection application for industrial use.
CONCLUSIONS
The thesis proposes a new moving object detection approach in H264/AVC compressed domain method for high-resolution video surveillance that exploits not the size of MBs but also the characteristics of MV fields of moving object to identify the interested moving object. The method can detect quickly most regions that contain moving objects even with uniform color objects.
The thesis is a result of a real project of a company so the ability to apply in practice is very high. The application using the proposed method in the thesis can helps people to search, detect the moments when movement happen more effectively. The people can save a lot of time and effort.
However, the proposed method still needs empirical thresholds in order to accurately detect the interested moving objects. In some scenes, the removal of noise motion like swaying tree branches cannot be done because the motion value of tree branches is high. For future work, we will focus on making the system self- tuning the thresholds by using machine learning to get the best results.
List of of author’s publications related to thesis
1. Minh Hoa Nguyen, Tung Long Vuong, Dinh Nam Nguyen, Do Van Nguyen, Thanh Ha Le and Thi Thuy Nguyen, “Moving Object Detection in Compressed Domain for High Resolution Videos,” SoICT ’17, pp. 364- 369, 2017.
2. Nguyễn Đình Nam, Nguyễn Thị Thủy, Nguyễn Đỗ Văn, Nguyễn Minh Hòa, Vương Tùng Long, Lê Thanh Hà, "Phương pháp phân tích và lưu trữ thông tin mô tả chuyển động trong nội dung viđeo và phương tiện lưu trữ dữ liệu tổng hợp mô tả chuyển động trong nội dung viđeo". Pending Patent, apply in 03/05/2017.
REFERENCES
[1] S. Aslam, "Omnicore," Omnicore Group, 18 9 2018. [Online]. Available:
https://www.omnicoreagency.com/youtube-statistics/.
[2] M. Piccardi, "Background subtraction techniques: a review," IEEE International Conference on Systems, Man and Cybernetics, pp. 3099-3104, 2004.
[3] A. A. T. D. a. A. C. Wren, "Pfinder: real-time tracking of the human body," IEEE Trans.
on Patfern Anal. and Machine Infell, vol 19, pp. 780-785, 1997.
[4] J. T. J. G. B. a. S. D.Koller, "Towards Robust Automatic Traffic Scene Analysis in Real- time," Proc. ICPR’94, pp. 126-131, 1994.
[5] B. a. S.A.Velastin, "Automatic congestion detection system for underground platforms,"
Proc. ISIMP2001, pp. 158-161, 2001.
[6] C. M. a. A. R.Cucchiara, "Detecting moving objects, ghosts, and shadows in video streams," IEEE Trans on Pattern Anal. and Machine Intell, vol. 25, pp. 1337-1442, 2003.
[7] C. a. W.E.L.Grimson, "Adaptive background mixture models for real-time tracking,"
Proc. IEEE CVPR 1999, pp. 246-252, 1999.
[8] P. P. a. J.A.Schoonees, "Understanding background mixture models for foreground segmentation," Proc. of IVCNZ 2002, pp. 267-271, 2002.
[9] M. T. a. P. R. Venkatesh Babu, "A survey on compressed domain video analysis techniques," Multimedia Tools and Applications, vol. 75, p. 1043–1078, 2016.
[10] G. G. a. G. T.Wiegand, "Overview of the H.264/AVC video coding standard," IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, pp. 560-576, 2003.
[11] D. J. G. a. H. Q. ZengW, "Robust moving object segmentation on H.264/AVC compressed video using the block-based MRF model," Real-Time Imaging, vol. 11, pp.
36-44, 2009.
[12] Y. L. a. Z. Z. Zhi Liu, "Real-time spatiotemporal segmentation of video objects in the H.264 compressed domain," Journal of visual communication and image representation, vol. 18, p. 275–290, 2007.
[13] F.-E. G. R.-B. L. M.-G. J. a. J.-L. L. C Solana-Cipres, "Real-time moving object segmentation in H.264 compressed domain based on approximate reasoning,"
International Journal of Approximate Reasoning, vol. 51, p. 99–114, 2009.
[14] C.-M. M. a. W.-K. Cham, "Real-time video object segmentation in H.264 compressed domain," IET Image Processing, vol. 3, p. 272 – 285, 2009.
[15] P. C. V. S. L. P. a. V. D. W. R. S De Bruyne, "Estimating motion reliability to improve moving object detection in the H.264/AVC domain," IEEE international conference on multimedia and expo, p. 290–299, 2009.
[16] Z. y. W. a. R. m. H. Shi zheng Wang, "Surveillance video synopsis in the compressed domain for fast video browsing," Journal of Visual Communication and Image Representation, vol. 24, p. 1431–1442, 2003.
[17] P. A. A. H. a. A. K. Marcus Laumer, "Compressed Domain Moving Object Detection by Spatio-Temporal Analysis of H.264/AVC Syntax Elements," Picture Coding Symposium (PCS), p. 282–286, 2015.
[18] R. V. B. a. R. G. P. Manu Tom, "Compressed domain human action recognition in H.264/AVC video streams," Multimedia Tools and Applications, vol. 74, no. 21, p. 9323–
9338, 2015.