ﺔﺑﻗارﻣﻟا ﺔﻣظﻧأ ﻲﻓ ﺔﻌﻧﻗﻷاو ﺔﺣﻟﺳﻷا نﻋ فﺷﮐﻟﻟ لﻣﻋ رﺎطإ رﯾوطﺗ - BSpace Home

63 Figure 41: Object detection model could not find the gun due to low resolution of the image. 65 Figure 44: Object detection model could not find the gun due to low resolution of the image.

Background

The intelligent monitoring system is installed to identify unusual activities such as fighting, using a mobile phone or fainting (Amrutha, Jyotsna & Amudha 2020). Another use of intelligent CCTV systems is for the detection of workers in potentially hazardous areas, such as excavators working in construction zones (Luo et al. 2020).

Problem Definition

The model is trained using various images from the internet in different colors, sizes, textures and shapes. Transfer learning is a strategy that freezes most layers of the model to speed up training and reduce generalization errors.

Proposed Solution

GoogleNet served as an inspiration for the architecture of YOLO, which consists of a total of 24 convolutional layers with fully connected layers connected at the end (Redmon et al. 2016). One thing to consider is that the weapon detector model must be able to handle low quality inputs and small caliber firearms.

Dataset Development

Pre-Processing

Color transformation

Model Learning

There are two types of loss calculated for the loss function of the SSD model: localization loss and confidence loss. Three different types of losses are calculated for the YOLO model: classification loss, confidence loss, and localization loss.

Best Model Selection

This chapter appropriately examines some of the latest weapon detection methods in video surveillance systems. This study will investigate the applications of artificial intelligence (deep learning techniques) in video surveillance systems to detect robberies and intruders.

Method

The search terms used in academic search engines were ("Weapon Detection" AND. "Artificial Intelligence") OR ("Gun Detection" AND "Deep Learning") OR ("Robbery Detection" AND "Deep Learning"). Then the inclusion and exclusion criteria were used to filter out the academic research papers to finally get 21 papers.

Quality Assessment

As long as the overall quality rating is higher than 75%, it means that all selected papers are quality papers based on the checklist and measurement identified in the systematic review.

Analysing Research Study and data Analysis

Literature Review

The model in this study also took into account the poor images and low resolution frames. The Hawk-Eye threat detector for real-time video surveillance was designed and used in this study. A real-time frame-based effective fire and gun detection computer vision model with a high accuracy metric was provided in this study.

This study compares between SSD and Faster R-CNN algorithms in terms of speed and accuracy for weapon detection models.

SLR Results

Finally, the proposed model is used in CCTV cameras in the designated area to be protected. Move to the second SLR question "What are the most advanced techniques used for weapon detection?". Now move to the third SLR question "How does the proposed model work with low-resolution frames?".

Finally, moving to the fourth SLR question "Is the proposed weapon detection model recommended in the banking area?".

Research Methodology

However, the proposed models are trained using tag images, where target objects in each image are tagged by humans to teach the model what kind of weapon and where it is. Many solutions can be applied to fix poor image quality. One of these solutions is to train the proposed model on a low-resolution dataset, change the image formats to an appropriate size, using super-resolution techniques through the VDSR network (Nasrollahi & Moeslund 2014) to improve the image quality. image quality. By reviewing all models in selected articles in Table 7, some models are applicable in the banking area because the model was trained on the custom dataset.

Many object detection methods and deep transfer learning approaches have been used in various applications, as previously mentioned in the problem solution section.

Labelling

As we have already mentioned, this process is very important and sensitive to achieve excellent results. Therefore, in order to produce a dataset with high-quality labels, it is very important to spend time early in the project to address error definitions and label validation. Annotating images can be time-consuming, and in some scenarios, detecting a brain tumor requires having experts in the field of knowledge, such as neurologists, who cannot be annotated by anyone without knowledge.

In our case, weapons and other related objects can be labeled by humans without the need for any specific experience or knowledge in any domain.

OpenCV

OpenCV could be used for many tasks such as face recognition, object detection, organizing individual actions in videos, creating 3D models of objects, creating 3D point clouds from stereo cameras (Chaurasia & Mozar 2022), searching for similar images from an image database, tracking eye movement, improved image quality, improved visibility level of blurry image and video and establishment of markers for overlay with augmented reality, etc.

Object Recognition

Image Classification
Object Localization
Object Detection

As shown in Figure 7, object detection is a subcategory of object recognition where the intended object is not only identified but also located in the image. One of the challenges of using object detection is that the bounding boxes are always rectangular. Additionally, some metrics such as object area and volume cannot be reliably estimated using object detection (Object Detection vs. Object Recognition vs. Image Segmentation - GeeksforGeeks n.d.).

In image analysis, convolutional neural networks have been shown to improve performance, especially in the areas of object detection and tracking (Cheong & Park 2017).

Classification and Detection Approach

Selective search is a commonly used method in object detection to make object proposals (Uijlings et al. 2013). The region proposal generates many bounding box candidates to assess the probability of a future object or intriguing information based on the type of object detection (Ma et al. 2017). Ross Girshick proposed an extension of R-CNN to solve speed issues by releasing Fast R-CNN (Ren et al. 2017).

Selective Search algorithms are quite slow and take a long time and have an impact on the model (Ren et al. 2017).

Backbone Networks

VGG16
Inception V2
MobileNet
ResNet50 V1
Darknet

The idea is to use different kernel filter sizes within the CNN concatenated in a single output layer instead of stacking them sequentially (Szegedy et al. 2015). Therefore, since 3x3 filter is 2.28 cheaper to compute than 5x5 filter, stacking two 3x3 convolutional filters results in a performance improvement (Szegedy et al. 2016). Inception V3 uses RMSprop optimization, batch normalization, use of 7×7 convolution filters, and label smoothing regularization (Szegedy et al. 2016).

MobileNet's core module is Depthwise Separable Convolutions, which is used to reduce the number of parameters compared to normal convolution nets and improve accuracy (Howard et al. 2017).

Object Detection Algorithms

SSD
YOLO

As mentioned in the third section of the first chapter, YOLO stands for You Only Looks Once and is one of the one-step object detectors. It is considered one of the most advanced real-time object detection techniques currently available. YOLOv4 used Bag of Freebies, which improves network performance without increasing inference time.

YOLOv4 uses techniques that ignore time and can drastically increase the accuracy of an object detector called "Bag of specials".

Dataset Construction

The cleaning, standardizing, processing, filtering, scaling, and feature selection for traditional machine learning models are all part of the Data Preprocessing for Supervised Leaning n.d. process. Pre-processing phases performed on the acquired images led to the creation of the final training dataset. The width, height, x, and y coordinates of the bounding box are saved as an XML format for the SSD model or as a text document for YOLO models.

One of these challenges is Background Clutter, where the target object can blend into the environment and make them difficult to identify.

Models

Metrics

The average precision (AP) was calculated in the Pascal VOC challenge by interpolating precision at 11 different points, and an intersection over union (IoU) of 0.5 is only considered. Meanwhile, Google Open Images Challenge uses average accuracy metric to evaluate the object detection model by predicting tight bounding boxes around target objects of 500 classes (Open Images Evaluation Protocols n.d.). In contrast to the COCO challenge (Lin et al. 2014), it used each point interpolation and calculated mean average accuracy over various thresholds, with IoU ranging from 0.5 to 0.95.

While the detector gets more false positives when the confidence score is low, so low precision and high recall.

The area of target objects is calculated as the number of pixels in the segmentation image. More weapon items will be missed by the detector as the confidence score threshold increases, meaning more false negatives and thus low recall and high precision. Where ‖ijnoobj is the complement to ‖ijobj when the target object is not detected in the box.

The results of training and evaluation of weapon detection models are presented in this chapter.

Training Results

The total loss drop behavior of the models for both dataset1 and dataset2 is quite similar. After 10,000 training steps, the total loss value of SSD MobileNet model on dataset1 is about 0.16 and the learning rate disappeared to zero. When configuring a neural network, the most important hyperparameter is the learning rate which determines how much the model should change based on the estimated error each time the model weights are updated.

Total loss is a combination of classification loss, localization loss, and regularization loss in SSD models.

Evaluation Results

overview
Confusion Matrix
Recall x Precision
F1 x Confidence

However, the average precision of the IoU threshold of 0.5 (AP@IoU = 0.5) is used to evaluate weapon detection models based on YOLO algorithms. The confusion matrix shows the ways in which your weapon detection model is confused when making a prediction. Since YOLO models outperform SSD models, we present the weapon detection confusion matrix for YOLOv5 model on dataset1 and dataset2 as shown in Figure 32 and Figure 33, respectively.

According to the confusion matrix in Figure 32, the most successful weapon detection rates were obtained for the mask theft class at 0.84, while the least successful rates were for the knife class at 0.4.

Detection Results

The following figure shows the results of weapon detection on surveillance camera footage in real scenarios.

Discussion

Choice of detector

Future Work

Conclusion

Proceedings of the International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud), I-SMAC 2018. Deep learning-based binary object detection methodology for recognizing small objects processed in a similar way: application in video surveillance. YOLOv7: Trainable bag-of-freebies sets a new state of the art for real-time object detectors.

A comprehensive survey towards high-level approaches to weapon detection using classical machine learning and deep learning methods.