Inference Time and Real-time Detection Experiment

feature extractor model had the same NME scores. However, the original feature extractor model had a slightly lower NME score across the entire test sets. Fig. 5- 1 shows that the model predicted facial landmarks better in the lab portion of the test set, but as with the bounding box, the difference between the predicted facial landmarks and the resulting facial landmarks was small.

5.3 Inference Time and Real-time Detection Exper-

the possibility of using models for real-time face detection.

Chapter 6 Conclusion

The research presented in this thesis focused on the area of computer vision in robotics, with the introduction of event-based cameras as a solution to the limitations of conventional frame-based cameras. Event-based cameras, which is a new type of retinomorphic sensors, have shown significant advantages over conventional cameras and have found various applications in computer vision. However, due to the relative novelty of event-based cameras, there has been a shortage of datasets and models to apply deep learning to event streams.

To address this gap, this thesis introduced the FES dataset, which is the first large and diverse event-based camera dataset for face and facial landamrks detection. The dataset contains 689 minutes of raw event streams recorded in both wild and controlled environments, with multiple face poses and distances. Additionally, the dataset provides accurately annotated bounding box and facial landmark coordi- nates, making it an ideal resource for the application of machine learning and other algorithms. Furthermore, 12 different models were created to detect faces and facial landmarks in real-time, using a modified architecture of feature extractor part with a combination of SSD. These models were able to detect faces and facial landmarks with high accuracy.

To encourage further research in this area, the dataset, codes, and trained models are shared on https://github.com/IS2AI/faces-in-event-streams under the MIT license. As the detection of faces and facial landmarks is only the first step towards

the detection of advanced face features such as emotions, gender, age, and face recognition, future work could focus on using these models to create software that can detect these features directly from event-based output. Therefore, some potential direction for further research should be using created models identify the key points of the face and determine the characteristics of the face from them, such as the classification of emotions, human identification. Also, future directions may be research to improve the accuracy of face detection from event-streams, so it can be the study of other architectures of neural networks and the use of attention mechanisms.

This work using these models can be applied to the field of autonomous driving, so using direct event-based output with the fastest response and our models, we can quickly respond to driver falling asleep while driving. Also, the developed models can be used for robotics, so using event cameras, you can supplement the model so that robots can quickly respond to human emotions and actions or identify a person in poor lighting conditions.

As a result, this research has contributed significantly to the development of computer vision and robotics, and the future of event-based cameras looks promising with the availability of resources such as the FES dataset and models for face and facial landmarks detection.

Bibliography

[1] Arnon Amir, Brian Taba, David Berg, Timothy Melano, Jeffrey McKinstry, Carmelo Di Nolfo, Tapan Nayak, Alexander Andreopoulos, Guillaume Garreau, Marcela Mendoza, and et al. A low power, fully event-based gesture recognition system. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7388–7397, 2017.

[2] Savran Arman and Bartolozzi Chiara. Face pose alignment with event cameras.

Sensors, 20:7079, 2020.

[3] Emre Avuçlu and Fatih Başçiftçi. An interactive robot design to find miss- ing people and inform their location by real-time face recognition system on moving images. Journal of Ambient Intelligence and Humanized Computing, 13(9):4385–4396, 2022.

[4] Souptik Barua, Yoshitaka Miyatani, and Ashok Veeraraghavan. Direct face detection and video reconstruction from event cameras. In Proc. of the IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1–9.

IEEE, 2016.

[5] Souptik Barua, Yoshitaka Miyatani, and Ashok Veeraraghavan. Direct face detection and video reconstruction from event cameras. 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1–9, 2016.

[6] Ryad Benosman, Charles Clercq, Xavier Lagorce, Sio-Hoi Leng, and Chiara Bartolozzi. Event-based visual flow. IEEE Transactions on Neural Networks and Learning Systems, 25(2):407–417, 2014.

[7] Ying Bi, Bing Xue, Pablo Mesejo, Stefano Cagnoni, and Mengjie Zhang. A survey on evolutionary computation for computer vision and image analysis: Past, present, and future trends. IEEE Transactions on Evolutionary Computation, 27(1):5–25, feb 2023.

[8] Wonmin Byeon, Qin Wang, Rupesh Kumar Srivastava, and Petros Koumout- sakos. Contextvp: Fully context-aware video prediction. Computer Vision – ECCV 2018, page 781–797, 2018.

[9] Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. Open- pose: Realtime multi-person 2d pose estimation using part affinity fields.

Transactions on Pattern Analysis and Machine Intelligence, 43(1):172–186, 2021.

[10] Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. Realtime multi-person 2d pose estimation using part affinity fields. 2017 IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), page 172–186, 2017.

[11] Dong-il “Dan” Cho and Tae-jae Lee. A review of bioinspired vision sensors and their applications. Sensors and Materials, pages 447–463, 2015.

[12] Xiyang Dai, Bharat Singh, Guyue Zhang, Larry S. Davis, and Yan Qiu Chen.

Temporal context network for activity localization in videos. 2017 IEEE Inter- national Conference on Computer Vision (ICCV), pages 5793–5802, 2017.

[13] Kirti Dang and Shanu Sharma. Review and comparison of face detection algorithms. 2017 International Conference on Cloud Computing, Data Science amp; Engineering - Confluence, pages 629–633, 2017.

[14] Bappaditya Debnath, Mary O’Brien, Swagat Kumar, and Ardhendu Behera.

Attentional learn-able pooling for human activity recognition.2021 IEEE Inter- national Conference on Robotics and Automation (ICRA), pages 13049–13055, 2021.

[15] Tobi Delbruck, Bernabe Linares-Barranco, Eugenio Culurciello, and Christoph Posch. Activity-driven, event-based vision sensors. Proceedings of 2010 IEEE International Symposium on Circuits and Systems, pages 2426–2429, 2010.

[16] Wanghua Deng and Ruoxue Wu. Real-time driver-drowsiness detection system using facial features. IEEE Access, 7:118727–118738, 2019.

[17] Perot Etienne, de Tournemire Pierre, Nitt Davide, Masci Jonathan, and Sironi Amos. Learning to detect objects with a 1 megapixel event camera. Advances in Neural Information Processing Systems, 33:16639–16652, 2020.

[18] Francisco F. Sales, David Portugal, and Rui P. Rocha. Real-time people detection and mapping system for a mobile robot using a rgb-d sensor. Proceedings of the International Conference on Informatics in Control, Automation and Robotics, pages 467–474, 2014.

[19] Matteo Fabbri, Guido Borghi, Fabio Lanzi, Roberto Vezzani, Simone Calderara, and Rita Cucchiara. Domain translation with conditional gans: From depth to rgb face-to-face.2018 International Conference on Pattern Recognition (ICPR), pages 1355–136, 2018.

[20] Di Feng, Christian Haase-Schutz, Lars Rosenbaum, Heinz Hertlein, Claudius Glaser, Fabian Timm, Werner Wiesbeck, and Klaus Dietmayer. Deep multi- modal object detection and semantic segmentation for autonomous driving:

Datasets, methods, and challenges. IEEE Transactions on Intelligent Trans- portation Systems, 22(3):1341–1360, 2021.

[21] Thomas Finateu, Atsumi Niwa, Daniel Matolin, Koya Tsuchimoto, Andrea Mascheroni, Etienne Reynaud, Pooria Mostafalu, Frederick Brady, Ludovic Chotard, Florian LeGoff, and et al. 5.10 a 1280×720 back-illuminated stacked temporal contrast event-based vision sensor with 4.86µm pixels, 1.066geps readout, programmable event-rate controller and compressive data-formatting pipeline. 2020 IEEE International Solid- State Circuits Conference - (ISSCC), pages 112–114, 2020.

[22] Javier Galbally, Sebastien Marcel, and Julian Fierrez. Image quality assessment for fake biometric detection: Application to iris, fingerprint, and face recognition. IEEE Transactions on Image Processing, 23(2):710–724, 2014.

[23] Guillermo Gallego, Tobi Delbruck, Garrick Orchard, Chiara Bartolozzi, Brian Taba, Andrea Censi, Stefan Leutenegger, Andrew Davison, Joerg Conradt, Kostas Daniilidis, and et al. Event-based vision: A survey. IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 44:154–180, Jan 2022.

[24] Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Manohar Paluri, and Du Tran. Detect-and-track: Efficient pose estimation in videos. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 350–359, 2018.

[25] Ross Girshick. Fast r-cnn. InProceedings of the IEEE international conference on computer vision, pages 1440–1448, 2015.

[26] Arren Glover and Chiara Bartolozzi. Event-driven ball detection and gaze fixa- tion in clutter. 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2203–2208, 2016.

[27] Dan Witzner Hansen and Qiang Ji. In the eye of the beholder: A survey of models for eyes and gaze. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3):478–500, 2010.

[28] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.

[29] Yang Heng, Jia Xuhui, Loy Chen, Change, and Robinson Peter. An empirical study of recent face alignment methods.arXiv preprint arXiv:1511.05049, pages 1–12, 2015.

[30] Samitha Herath, Mehrtash Harandi, and Fatih Porikli. Going deeper into action recognition: A survey. Image and Vision Computing, 60:4–21, 2017.

[31] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997.

[32] Hao Hu, Hua Cai, Zhiyong Ma, and Weigang Wang. Semantic segmentation based on semantic edge optimization. 2021 International Conference on Elec- tronic Information Engineering and Computer Science (EIECS), pages 612–615, 2021.

[33] Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7132–7141, 2018.

[34] Yuhuang Hu, Shih-Chii Liu, and Tobi Delbruck. V2e: From video frames to realistic dvs events. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1312–1321, 2021.

[35] Eric R. Kandel, James H. Schwartz, Thomas M. Jessell, Steven A. Siegelbaum, and A. J. Hudspeth. Visual Processing by the Retina, page 507–515. McGraw- Hill, 2012.

[36] Syamsiar Kautsar, B. Widiawan, Bety Etikasari, Saiful Anwar, Rosiana Dwi Yunita, and Mat Syai’in. A simple algorithm for person-following robot control with differential wheeled based on depth camera. 2019 International Confer- ence on Computer Science, Information Technology, and Electrical Engineering (ICOMITEE), 2019.

[37] Shehryar Khattak, Frank Mascarich, Tung Dang, Christos Papachristos, and Kostas Alexis. Robust thermal-inertial localization for aerial robots: A case for direct methods. 2019 International Conference on Unmanned Aircraft Systems (ICUAS), pages 1061–1068, 2019.

[38] Hanme Kim, Ankur Handa, Ryad Benosman, Sio-Hoi Ieng, and Andrew Davi- son. Simultaneous mosaicing and tracking with an event camera. Proceedings of the British Machine Vision Conference 2014, pages 566–576, 2014.

[39] Marcin Kopaczka, Raphael Kolk, Justus Schock, Felix Burkhard, and Dorit Merhof. A thermal infrared face database with facial landmarks and emotion labels. IEEE Transactions on Instrumentation and Measurement, 68(5):1389–

1401, 2018.

[40] Askat Kuzdeuov, Dana Aubakirova, Darina Koishigarina, and Huseyin Atakan Varol. TFW: Annotated thermal faces in the wild dataset. IEEE Transactions on Information Forensics and Security, 17:2084–2094, 2022.

[41] Askat Kuzdeuov, Darina Koishigarina, and Hüseyin Atakan Varol. Anyface: A data-centric approach for input-agnostic face detection. pages 211–218, 2022.

[42] Annamalai Lakshmi, Anirban Chakraborty, and Chetan S. Thakur. Neuromor- phic vision: From sensors to event-based algorithms. WIREs Data Mining and Knowledge Discovery, 9(4):1–34, 2019.

[43] Colin Lea, Michael D. Flynn, Rene Vidal, Austin Reiter, and Gregory D. Hager.

Temporal convolutional networks for action segmentation and detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1003–1012, 2017.

[44] Gregor Lenz, Sio-Hoi Ieng, and Ryad Benosman. Event-based face detection and tracking using the dynamics of eye blinks. Frontiers in Neuroscience, 14:587, 2020.

[45] Bin Li, Hu Cao, Zhongnan Qu, Yingbai Hu, Zhenke Wang, and Zichen Liang.

Event-based robotic grasping detection with neuromorphic vision sensor and event-grasping dataset. Frontiers in Neurorobotics, 14:1–14, 2020.

[46] Hongmin Li, Hanchao Liu, Xiangyang Ji, Guoqi Li, and Luping Shi. Cifar10- dvs: An event-stream dataset for object classification. Frontiers in Neuro- science, 11, 2017.

[47] P. Lichtsteiner and T. Delbruck. A 64x64 event-driven logarithmic temporal derivative silicon retina. Research in Microelectronics and Electronics, 2005 PhD, 2005.

[48] Patrick Lichtsteiner, Christoph Posch, and Tobi Delbruck. A 128×128 120 db 15 𝜇s latency asynchronous temporal contrast vision sensor. IEEE Journal of Solid-State Circuits, 43(2):566–576, 2008.

[49] Dongyun Lin, Yiqun Li, Yi Cheng, Shitala Prasad, and Aiyuan Guo. Masked face recognition via self-attention based local consistency regularization. 2022 IEEE International Conference on Image Processing (ICIP), pages 436–440, 2022.

[50] Hongjie Liu, Diederik Paul Moeys, Gautham Das, Daniel Neil, Shih-Chii Liu, and Tobi Delbruck. Combined frame- and event-based detection and tracking.

2016 IEEE International Symposium on Circuits and Systems (ISCAS), pages 2511–2514, 2016.

[51] Shih-Chii Liu and Tobi Delbruck. Neuromorphic sensory systems. Current Opinion in Neurobiology, 20(3):288–295, 2010.

[52] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Cheng-Yang Fu, and Alexander C. Berg. SSD: Single shot multibox detector. Computer Vision and Pattern Recognition, pages 21–37, 2016.

[53] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. Ssd: Single shot multibox detector.

European Conference on Computer Vision (ECCV), pages 21–37, 2016.

[54] YunXiang Liu, QianXun Guan, and XinXin Yuan. Research on complex scene recognition based on semantic segmentation.2022 7th International Conference

[55] Florian Mahlknecht, Daniel Gehrig, Jeremy Nash, Friedrich M. Rockenbauer, Benjamin Morrell, Jeff Delaune, and Davide Scaramuzza. Exploring event camera-based odometry for planetary robots. IEEE Robotics and Automation Letters, 7(4):8651–8658, 2022.

[56] Salim Malek and Silvia Rossi. Head pose estimation using facial-landmarks classification for children rehabilitation games. Pattern Recognition Letters, 152:406–412, 2021.

[57] Ana I. Maqueda, Antonio Loquercio, Guillermo Gallego, Narciso Garcia, and Davide Scaramuzza. Event-based vision meets deep learning on steering prediction for self-driving cars. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1–9, 2018.

[58] Timothée Masquelier and Simon J Thorpe. Unsupervised learning of visual features through spike timing dependent plasticity. PLoS Computational Biology, 3(2):247–257, 2007.

[59] Warren McCulloch and Walter Pitts. A logical calculus of the ideas immanent in nervous activity (1943). Ideas That Created the Future, page 79–88, 1943.

[60] Pascal Mettes and Cees G. Snoek. Spatial-aware object embeddings for zero- shot localization and classification of actions. 2017 IEEE International Confer- ence on Computer Vision (ICCV), pages 1–10, 2017.

[61] Elias Mueggler, Basil Huber, and Davide Scaramuzza. Event-based, 6-dof pose tracking for high-speed maneuvers. 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2761–2768, 2014.

[62] Daniel Neil, Shih-Chii Liu, and Delbruck Tobias. Deep neural networks and hardware systems for event-driven data. PhD thesis, 2017.

[63] Peter O’Connor, Daniel Neil, Shih-Chii Liu, Tobi Delbruck, and Michael Pfeif- fer. Real-time classification and sensor fusion with a spiking deep belief network.

Frontiers in Neuroscience, 7, 2013.

[64] Yael Omer, Roni Sapir, Yarin Hatuka, and Galit Yovel. What is a face? Critical features for face detection. Perception, 48(5):437–446, 2019.

[65] Christoph Posch, Ryad Benosman, and Ralph Etienne-Cummings. Giving ma- chines humanlike eyes. IEEE Spectrum, 52(12):44–49, 2015.

[66] Christoph Posch, Teresa Serrano-Gotarredona, Bernabe Linares-Barranco, and Tobi Delbruck. Retinomorphic event-based vision sensors: Bioinspired cameras with spiking output. Proceedings of the IEEE, 102(10):1470–1484, 2014.

[67] Bharath Ramesh and Hong Yang. Boosted kernelized correlation filters for event-based face detection.2020 IEEE Winter Applications of Computer Vision Workshops (WACVW), pages 155–159, 2020.

[68] Muhammad Ramzan, Hikmat Ullah Khan, Shahid Mahmood Awan, Amina Ismail, Mahwish Ilyas, and Ahsan Mahmood. A survey on state-of-the-art drowsiness detection techniques. IEEE Access, 7:61904–61919, 2019.

[69] Rajeev Ranjan, Vishal M. Patel, and Rama Chellappa. Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(1):121–135, 2019.

[70] Henri Rebecq, Guillermo Gallego, Elias Mueggler, and Davide Scaramuzza.

Emvs: Event-based multi-view stereo—3d reconstruction with an event camera in real-time. International Journal of Computer Vision, 126(12):1394–1414, 2017.

[71] Henri Rebecq, Rene Ranftl, Vladlen Koltun, and Davide Scaramuzza. Events- to-video: Bringing modern computer vision to event cameras. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–23, 2019.

[72] Sweety Reddy, Silky Goel, and Rahul Nijhawan. Real-time face mask detection using machine learning/ deep feature-based classifiers for face mask recognition.

2021 IEEE Bombay Section Signature Conference (IBSSC), pages 1–6, 2021.

[73] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 61–72, 2016.

[74] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6):1137–1149, 2017.

[75] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation, page 234–241. Springer Interna- tional Publishing, Cham, 2015.

[76] Imaan Roomaney, Clement Nyirenda, and Manogari Chetty. Facial imaging to screen for fetal alcohol spectrum disorder: Anbsp;scoping review. Alcoholism:

Clinical and Experimental Research, 46(7):1166–1180, 2022.

[77] Cian Ryan, Brian O’Sullivan, Amr Elrasad, Aisling Cahill, Joe Lemley, Paul Kielty, Christoph Posch, and Etienne Perot. Real-time face amp; eye tracking and blink detection using event cameras. Neural Networks, 141:87–97, 2021.

[78] Abd AL-BastRashed Saabia, TarekAbd El-Hafeez, and Alaa M Zaki. Face recognition based on grey wolf optimization for feature selection. In Proc. of the International Conference on Advanced Intelligent Systems and Informatics, pages 273–283. Springer, 2018.

[79] Safaa Najah Saud Al-Humairi, Ummar Idraqi Noh, and Wee Ying Ci. Raspberry pi based: Design an android-thermal surveillance robot. 2022 IEEE Conference on Systems, Process amp; Control (ICSPC), pages 7–11, 2022.

[80] Arman Savran and Chiara Bartolozzi. Face pose alignment with event cameras.

Sensors, 20(24):7079, 2020.

[81] R. Serrano-Gotarredona, M. Oster, P. Lichtsteiner, A. Linares-Barranco, R. Paz-Vicente, F. Gomez-Rodriguez, L. Camunas-Mesa, R. Berner, M. Rivas- Perez, T. Delbruck, and et al. Caviar: A 45k neuron, 5m synapse, 12g con- nects/s aer hardware sensory–processing– learning–actuating system for high- speed visual object recognition and tracking. IEEE Transactions on Neural Networks, 20(9):1417–1438, 2009.

[82] Amos Sironi, Manuele Brambilla, Nicolas Bourdis, Xavier Lagorce, and Ryad Benosman. Hats: Histograms of averaged time surfaces for robust event-based object classification. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1–10, 2018.

[83] Bongki Son, Yunjae Suh, Sungho Kim, Heejae Jung, Jun-Seok Kim, Chang- woo Shin, Keunju Park, Kyoobin Lee, Jinman Park, Jooyeon Woo, and et al.

4.1 a 640×480 dynamic vision sensor with a 9µm pixel and 300meps address- event representation. 2017 IEEE International Solid-State Circuits Conference (ISSCC), pages 66–67, 2017.

[84] Linye Song, Irene Schicker, Petrina Papazek, Alexander Kann, Benedikt Bica, Yong Wang, and Mingxuan Chen. Machine learning approach to summer precipitation nowcasting over the eastern alps. Meteorologische Zeitschrift, 29(4):289–305, 2020.

[85] Lea Steffen, Daniel Reichard, Jakob Weinland, Jacques Kaiser, Arne Roennau, and Rüdiger Dillmann. Neuromorphic stereo vision: A survey of bio-inspired sensors and algorithms. Frontiers in Neurorobotics, 13:1–10, 2019.

[86] Ruolin Sun, Dianxi Shi, Yongjun Zhang, Ruihao Li, and Ruoxiang Li. Data- driven technology in event-based vision. Complexity, 2021:1–19, 2021.

[87] Richard Szeliski. Computer vision algorithms and applications. Springer Nature Switzerland AG, 2023.

[88] Tasbolat Taunyazov, Weicong Sng, Brian Lim, Hian Hian See, Jethro Kuan, Abdul Fatir Ansari, Benjamin Tee, and Harold Soh. Event-driven visual-tactile sensing and learning for robots. Robotics: Science and Systems XVI, pages 1–13, 2020.

[89] Mohammad-Hassan Tayarani-Najaran and Michael Schmuker. Event-based sensing and signal processing in the visual, auditory, and olfactory domain:

A review. Frontiers in Neural Circuits, 15:1–15, 2021.

Dalam dokumen Nazarbayev University Repository (Halaman 51-63)