Single Object Tracking with Minimum False Positive using YOLOv4, VGG16, and Cosine Distance

(1)

Single Object Tracking with Minimum False Positive using YOLOv4, VGG16, and Cosine Distance

Galuh Ramaditya^*, Wikky Fawwaz Al Maki

School Of Computing, Informatics, Telkom University, Bandung, Indonesia

Email: ^1,*[email protected], ²[email protected] Email Penulis Korespondensi: [email protected]

Abstrak−Siamese network is a solution to the problem of single object tracking. Siamese network is a comparison method that uses a neural network in it. In this case, the siamese network extracts several images with the same weight, then compares them. There have been many studies that use the siamese network for single object detection. However, many still have high false positives when the target object is in and out of the frame or is entirely blocked by something so that the target data stored has a high level of damage. This study aims to create a method to track an object (single object tracking), even though the object is blocked by something, temporarily exits the frame with minimum false positive to keep the target data clean. The authors developed the method based on YOLOv4, VGG16, and cosine distance. Furthermore, researchers combine these methods to solve these problems with the concept of a siamese network. The result is that the system can track a person even if the target is entirely blocked by something or even in and out of the frame and reappears in a different location with minimum false positive.

Kata Kunci: Single Object Tracking; Siamese Network; YOLOv4; VGG16; Cosine Distance; False Positive

1. INTRODUCTION

Nowadays, technology is growing very fast. Almost every day, there is always the latest innovation made. Start with technology in the health sector to technology in the security sector. We can feel it, where before 2000, technology was very different from today. For example, in 2000, the development of cell phones was still new.

However, almost all cell phones already have touch screen features ten years later.

Current technological growth is closely related to artificial intelligence systems, where almost all aspects of technology have artificial intelligence systems. Artificial intelligence is a system that can behave like humans, such as being able to learn by itself and others. With this artificial intelligence system, humans can easily do their jobs.

One part of artificial intelligence is computer vision. Computer vision is a sector that discusses how computers can understand images and videos more deeply [1]. The goal is that computers can gain knowledge from images or videos like human vision. For example, when humans see, humans gain knowledge about the existence of objects around them, likewise with computer vision which can also detect objects around it.

Currently, computer vision has been widely applied in everyday life. For example, in the humanitarian sector, we can make computers able to detect objects around us. This ability is beneficial for people with problems with their eyesight. Computers will notify objects and even activities around them. Therefore, computer vision is essential because it can provide enormous benefits.

Computer vision can also be applied to the security sector. In this sector, we can make computers able to detect theft, traffic violations, and even detect someone who needs help just from their movements. In addition, we can also make a computer track someone on a video or even a CCTV network for the sake of investigation or searching for someone.

There are two types of object tracking in computer vision midwifery. There is single object tracking, an object tracking system where only one object is tracked [2][3]. Furthermore, multi-object tracking is an object tracking system that can track objects in bulk [4][5]. Currently, many methods can be used to develop object tracking systems. Start with CSRT and MIL, which is a method commonly used for single object detection [6][7], until Deep SORT, which is commonly used for multi-object detection [8].

Especially in the method used for single object detection, there are several weaknesses related to the problem of obstacles. For example, the system wrongly tracks the target if another object blocks the tracked object.

Moreover, suppose the system to be built is a system that uses many videos. In that case, it is not very easy to determine the target in the following video if the target disappears in the previous video.

Due to such problems, the siamese network method is the solution. The Siamese network is a comparison method that uses a neural network in it [9][10][11][12][13]. In this case, the siamese network extracts images with the same weight, then compares them with other images. Because this method compares objects and targets detected previously, this concept makes it possible to detect targets correctly.

One of the best siamese network methods is SiamRPN which is a siamese network based on Region Proposal Network [14][15]. The siamRPN method can handle the above problem, but because the system always detects the target every frame, even when the target is not in the frame, the system still detects it. This method causes a high rate of false positives, which can corrupt the stored target data. Therefore, the author examines a method to develop the concept of a conjoined network.

(2)

The author examines a new method with a low false positive rate because this method can be the basis of a single object tracking system that relies on object recognition. One implementation of single object tracking that relies on object recognition is single object tracking on multi-cameras. the system will track the target on another camera if the target leaves one camera. One method that can implement the system is the object recognition method which collects the dataset from the target data tracked in the first camera. The dataset must be clean so that object recognition can run well. If the dataset contains corrupted data, one of which is caused by incorrect data (high false positive rate), then the object recognition system's failure rate is very high.

Authors researched a method that aims to track an object even if the object is blocked by something, temporarily exits the frame or maybe changes videos with minimal false positives. The researcher developed the method based on YOLOv4, VGG16, and cosine distance. With the concept of a siamese network, authors combine these methods so that they can be a solution to the problems above.

2. RESEARCH METHODOLOGY

2.1 Research Stages

This research begins by exploring the existing concepts in single object tracking and the methods so that the authors determine that the siamese network concept is the most likely to solve the problems above. However, the methods on this concept have a high false positive, so the author assembles the algorithm with various sub-methods such as object detection, pre-trained models, and image similarity measurements, each of which is explored to obtain the most suitable sub-method. After the most suitable sub-method is determined, the author arranges the algorithm until it finds the best one to solve the problem above. After getting the best one, the author tests the algorithm on public datasets with different scenarios. For a visualization of the research stages, see Figure 1.

Figure 1. Visualization of the research stages 2.2 Method

Before explaining the research method in more detail, several constants must be considered first because these constants are very important for this method. For an explanation of each constant, it can be seen in Table 1.

Table 1. Descriptions of constants

Constants Descriptions

PERSON_SIZE The object size will be resized according to this constant before being extracted.

INCREASE_PX_DEFAULT

The number of a pixel will be added to the coordinates of the last target object that forms the search box so that the object detector will only detect objects that are at these coordinates.

INCREASE_PX_DELTA A number to enlarge the search box if the target is not tracked in the next frame.

COMPARING_TOTAL Total targets that have been tracked will be compared to candidate objects.

COMPARING_DELTA Step data array to be compared with candidate objects.

COSINE_ THRESHOLD The maximum limit of the cosine distance indicates that the candidate object is the target.

2.2.1 Image Similarity Measurement

The system built is a person tracking system on CCTV video. In this system, the target object will be compared with the candidate object (the candidate object is the object detected in the search box). The system will consider

(3)

the candidate object as the real target if the comparison value with the cosine distance is less than the initialized threshold. Cosine distance is the development of cosine similarity. This measurement method measures the cosine of the angle between two vectors [16][17]. The cosine distance equation is as Equation (1). Authors use the cosine distance because this equation produces results that are easier to implement in this method.

𝐶𝑜𝑠𝑖𝑛𝑒 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝐴, 𝐵) = 1 − ^𝐴∙𝐵

||𝐴|| ||𝐵|| (1)

2.2.2 Pre-trained Model

Measurements with the cosine distance are carried out after the extracted image data features using VGG16.

VGG16 is a simple CNN model which has 50 layers deep [18][19][20][21]. VGG16 was chosen because the distance between the measurement results of the same object equation and the different objects was far, so the threshold can be more easily applied to this method.

The approach taken by the Authors researched a method that aims to track an object even if the object is blocked by something, temporarily exits the frame or maybe changes videos. The method that the researcher developed is based on YOLOv4 and VGG16. With the concept of a siamese network, authors combine these methods so that they can be a solution to the problems above.

Comparing objects is to extract the features of objects resized with VGG16. Then the extract results are flattened so that they become a 1-dimensional array divided by 255. After that, the data in the array is compared with the results of extracting other objects using the cosine distance. For an illustration of this approach, see Figure 2.

Figure 2. Object comparison approach 2.2.3 Object Detection

In this method, the system will first run the video one by one frame and display which objects are detected as humans by applying YOLOv4 as the object detector in each frame. An example of an image of this can be seen in Figure 3. YOLOv4 is used as the object detector method in this method. YOLOv4 is the latest generation of YOLO, which is a regression-based object detector that can detect objects in real-time [22][23][24]. This study uses YOLOv4 because YOLOv4 is easier to implement this method. In addition, YOLOv4 has good accuracy and can touch more than 20 fps (using CPU) in this method.

Figure 3. The frame before the target is selected 2.2.4 Target Selection

While the system is running the video, the user can press a pre-set button to pause the system temporarily (pause).

After the system with the current frame is paused, the user can select one of the objects detected as human to be the tracking target. After that, the object is resized according to the PERSON_SIZE constant and extracted with VGG16. After the process, the extracted results and object coordinates are entered into an array. For illustration of target selection flow, see Figure 4.

Figure 4. Target selection flow

(4)

2.2.5 Search Box

After the target is selected, the system frame will continue to the next frame. In this frame, the system creates a search box where this search box is the coordinates that will be the place for the object detection process to be carried out. Suppose an object is detected in the box. In that case, an object is a candidate object for which the comparison process will be carried out with the target data. Figure 5 shows the system detecting the target and marking it with a red-green box. A description of each box can be seen in Figure 6.

Figure 5. Search box

Figure 6. Box description

This search box is obtained from the coordinates of the last target enlarged by the constant INCREASE_PX_DEFAULT pixels. If the target is not detected, then in the next frame, the search box will be enlarged by the constant INCREASE_PX_DELTA pixels. The search box enlargement process will stop when the search box reaches the frame size limit. If in the next frame the target is detected, the search box moves to the coordinates of the detected target. It is again enlarged by the INCREASE_PX_DEFAULT constant pixels. The equation is as Equation (2-5):

𝑆𝐶_{𝑙𝑒𝑓𝑡} = {𝑇𝐶_{𝑙𝑒𝑓𝑡}− 𝐼𝐷 − I∆, 𝑇𝐶_{𝑙𝑒𝑓𝑡}− 𝐼𝐷 − I∆> 0

0, 𝑇𝐶 − 𝐼𝐷 − I∆≤ 0 (2)

𝑆𝐶_𝑡𝑜𝑝 = {𝑇𝐶_𝑡𝑜𝑝− 𝐼𝐷 − I∆, 𝑇𝐶_𝑡𝑜𝑝− 𝐼𝐷 − I∆> 0

0, 𝑇𝐶_𝑡𝑜𝑝− 𝐼𝐷 − I∆≤ 0 (3)

𝑆𝐶_{𝑟𝑖𝑔ℎ𝑡} = {𝑇𝐶_{𝑟𝑖𝑔ℎ𝑡}+ 𝐼𝐷 + I∆, 𝑇𝐶_{𝑟𝑖𝑔ℎ𝑡}− 𝐼𝐷 − I∆< 𝐹𝑟𝑎𝑚𝑒 𝑤𝑖𝑑𝑡ℎ

𝐹𝑟𝑎𝑚𝑒 𝑤𝑖𝑑𝑡ℎ, 𝑇𝐶_{𝑟𝑖𝑔ℎ𝑡}− 𝐼𝐷 − I∆≥ 𝐹𝑟𝑎𝑚𝑒 𝑤𝑖𝑑𝑡ℎ (4)

𝑆𝐶_{𝑏𝑜𝑡𝑡𝑜𝑚}= {𝑇𝐶_{𝑏𝑜𝑡𝑡𝑜𝑚}+ 𝐼𝐷 + I∆, 𝑇𝐶_{𝑏𝑜𝑡𝑡𝑜𝑚}− 𝐼𝐷 − I∆< 𝐹𝑟𝑎𝑚𝑒 ℎ𝑒𝑖𝑔ℎ𝑡

𝐹𝑟𝑎𝑚𝑒 ℎ𝑒𝑖𝑔ℎ𝑡, 𝑇𝐶_{𝑏𝑜𝑡𝑡𝑜𝑚}− 𝐼𝐷 − I∆≥ 𝐹𝑟𝑎𝑚𝑒 ℎ𝑒𝑖𝑔ℎ𝑡 (5) Where SC is search box coordinates, TC is last target coordinates, ID is INCREASE_PX_DEFAULT constant and IΔ is INCREASE_PX_DELTA constant * (recent frame index – last target frame index – 1).

2.2.6 Next Target Selection

After the search box is determined, the object detection process is carried out at the box coordinates. Each detected object will be resized according to PERSON_SIZE, then extracted with VGG16. The extract results will be compared with the last array of data as much as the COMPARING_TOTAL constant value and with a data step of the COMPARING_DELTA value using the cosine distance. The comparison result between the data is averaged so that the average value is the value of the candidate object.

After that, the smallest value of each object will be taken (closer to 0 indicates the minor difference between the objects being compared) and then compared with the COSINE_THRESHOLD constant value. If the value of

(5)

the smallest object is less than COSINE_THRESHOLD, the system will assume that the object is the target and vice versa. The equation is as Equation (6):

𝑀𝑒𝑎𝑛 𝐶𝑜𝑠𝑖𝑛𝑒 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑋) = {

∑^𝐶𝑇_𝑖=1𝐶𝑜𝑠𝑖𝑛𝑒 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝐴𝑇𝑛𝑡−𝑐∆∗(𝑖−1),𝑋)

𝐶𝑇 , 𝑛𝑡 > 𝐶∆ ∗ 𝐶𝑇 − 𝐶∆

∑^⌈ 𝐶𝑜𝑠𝑖𝑛𝑒 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝐴𝑇𝑛𝑡−𝑐∆∗(𝑖−1),𝑋) 𝑛𝑡

𝐶∆⌉ 𝑖=1

𝐶𝑇 , 𝑛𝑡 ≤ 𝐶∆ ∗ 𝐶𝑇 − 𝐶∆

(6)

Where 𝐴𝐶 is array containing candidate objects, 𝑛𝑐 is number of 𝐴𝐶, 𝐴𝑇 is array containing target data, 𝑛𝑡 is number of 𝐴𝑇, 𝐶𝑇 is COMPARING_TOTAL constant, 𝐶∆ is COMPARING_DELTA constant and determining the object is the target is if min([Mean Cosine Distance(𝐴𝐶₁),..,Mean Cosine Distance(𝐴𝐶_𝑛𝑐)]) lower then COSINE_THRESHOLD constant.

2.3 Flowchart

Figure 7 is a flowchart of this method, the same as the flow of usage and processing on the system.

Figure 7. Flowchart of the proposed method

3. RESULTS AND DISCUSSION

3.1 Visualization Result

With the above algorithm, this method can solve the problem when another object blocks the target or disappears temporarily, as shown in Figure 8. This method determines that the only detected objects in the search box are the target only if the cosine distance value is less than the initiated COSINE_THRESHOLD constant. Determination of the target is the object will be compared with several target data whose number is by the initiated COMPARIING_TOTAL constant to increase the accuracy.

Figure 8. The results of the implementation of the method on CCTV

The determination of target data compared to detected objects can be stepped according to the initiated COMPARIING_DELTA constant to avoid overfitting. Therefore, comparing objects with the target is better even though the target is not detected for several frames. Suppose the comparison is only made to the target data sequentially. In that case, the object tracking failure rate is very high in the next frame.

(6)

3.2 Comparison with Another Method

This method succeeded in minimizing false positives in the built system compared with another method, SiamRPN. Figure 9 shows that this method does not detect the target when the target is not present, while SiamRPN detects it. This method can handle the false positive problem because the system does not always have to detect the target in every frame. After all, the target may not be in that frame. Suppose it is forced to detect the target in every frame constantly, and it turns out that the system incorrectly determines the target at a specific frame. In that case, the process will result in damaged target data and tracking errors in the next frame.

Figure 9. Visual comparison with another method

True positives occur when the model correctly predicts the positive class, such as when the system correctly determines that the detected object is the intended target. A false positive occurs when the model incorrectly predicts a positive class, as in this case, when the system incorrectly determines that the detected object is the target. True negative occurs when the model correctly predicts the negative class, such as when the system does not detect a target when there is no target. False negative occurs when the model incorrectly predicts a negative class, such as when the system does not detect the target when the target is present [25][26]. The Equation (7) below allows us to calculate accuracy by knowing true positive, true negative, false positive, and false negative.

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = ^{𝑇𝑃+𝑇𝑁}

𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁× 100% (7)

Where TP is true positive, TN is true negative, FP is false positive and FN is false negative. The authors used 20 videos from public datasets to evaluate this method. Each of these videos has a different scenario. The scenarios in the videos taken have the same case as the problems raised in this study: the target being blocked, the target leaving the frame and re-entering it, and so on. Table 2-21 contains a cofusion matrix of 20 videos from the tested public dataset.

Table 2. Confusion matrix of video “Basketball” with scenario target in and out of frame for a while [27]

Method TP TN FP FN

SiamRPN 459 0 266 0

This Method 458 191 0 76

Table 3. Confusion matrix of video “Blur Body” with scenario target always in the frame [27]

SiamRPN 334 0 0 0

Table 4. Confusion matrix of video “Freeman 4” with scenario target partially blocked for a while [27]

SiamRPN 201 0 96 0

Table 5. Confusion matrix of video “Human 4” with scenario target is blocked and small in size [27]

SiamRPN 529 0 138 0

Table 6. Confusion matrix of video “Skating 1” with scenario target in and out of the frame and changing

lighting [27]

SiamRPN 219 0 181 0

Table 7. Confusion matrix of video “Baseball 002”

with scenario target shooting position changing drastically for a while [28]

SiamRPN 368 0 345 0

Table 8. Confusion matrix of video “BruceLi_03”

with scenario target in and out of frames for a while [28]

SiamRPN 470 0 332 0

with scenario target in and out of frames for a while [28]

(7)

SiamRPN 973 0 69 0 This Method 943 47 0 52

with scenario target in and out of frames [28]

SiamRPN 278 0 167 0

Table 11. Confusion matrix of video “Chest 5” with scenario the shooting angle changes fast [28]

SiamRPN 301 0 140 0

Table 12. Confusion matrix of video “Chest 6” with scenario target going in and out of the frame for a while and disappearing at the end of the video [28]

SiamRPN 306 0 494 0

This Method 347 244 20 189

Table 13. Confusion matrix of video “Chess 7” with scenario target always in the frame but rapid shooting

angle changes [28]

SiamRPN 355 0 275 0

Table 14. Confusion matrix of video “monitor black car” with scenario target is temporarily blocked

completely and the target is very small [28]

SiamRPN 272 0 120 0

Table 15. Confusion matrix of video “bluegirl monitor” with scenario target is blocked for a long

time and reappearing in the frame [28]

SiamRPN 471 0 671 0

Table 16. Confusion matrix of video “monitor boy”

with scenario target often being completely blocked for a while [28]

SiamRPN 1282 0 1997 0

This Method 2170 327 0 782

Table 17. Confusion matrix of video

“NBA2k_Kawayi_01” with scenario target going in and out of frames for a while [28]

SiamRPN 129 0 68 0

Table 18. Confussion matrix of video

“NBA2k_Kawayi_02” with scenario target going in and out of frames before finally disappearing [28]

SiamRPN 44 0 74 0

“NBA2k_Kawayi_03” with scenario target going in and out of frames for a while [28]

SiamRPN 158 0 102 0

Table 20. Confussion matrix of video

“NBA2k_Kawayi_04” with scenario target always in frame [28]

SiamRPN 248 0 82 0

“NBA2k_Kawayi_05” with scenario target always in frame [28]

SiamRPN 176 0 52 0

3.3 The Accuracy of The Proposed Method

In the confusion matrix data in Table 2-21, we can see that this method can also minimise false positives. With the confusion matrix data, researchers can calculate the accuracy of each video. For the accuracy of the average tests that have been carried out, it can be seen in Table 22. In the table, it can be concluded that this method has better accuracy than other methods.

(8)

Table 22. Average accuracy of the tested videos Method Accuracy

SiamRPN 62.98%

This Method 80.41%

Suppose the target is not detected for some time in the future. In that case, it could be that the actual target has already left the search box so that the target will not be detected again at any time because the search box will only be in the place where the target was last detected. To be able to handle this, the author makes the search box will enlarge by the SEARCH_BOX_INCREASE_DELTA constant that has been set before every frame that is not detected. So no matter how far the target moves while undetected, the rate the system will be able to track it back is very high.

3.3 Implementation

Many systems can implement this method, one of which is a single object detection system with multiple connected cameras. With clean target data and minimal damage with incorrect data, if applied to a multi-camera system, the system will easily find targets on different cameras because the system can rely on target recognition by utilizing stored target data. If you use a method with a high false positive rate, then target recognition on different cameras will be very difficult because the target data that becomes the dataset contains incorrect data, so the error rate in the system in recognizing targets on other cameras will be high.

4. CONCLUSION

This method is still slower than other methods Because this approach always processes YOLOv4. Also, based on YOLOv4, object detector sometimes fails to detect objects. Suppose the object detection system fails to identify the object during processing. In this case, the system will also fail to detect the target. This case causes the false negative results in this method to be high. However, this is not a problem because the target data remains intact and minimally contaminated with erroneous data. The proposed method is also highly dependent on the given threshold. This threshold determines whether the object is a target or not. The difficulty is setting a threshold for each video the system wants to process. Due to brightness, target location, and video resolution, the target distance value between frames may differ. Therefore, it is necessary to tune the threshold in this method before starting the tracking process. The results obtained by this method have fairly good accuracy. Compared to the other method (SiamRPN), which only has an accuracy of 62.98%, this method is successful in tracking the target that enters the frame and is completely blocked by something for a while with minimal false positives so that it can have an accuracy of 80.41% on the public dataset tested. This method can overcome the problem above so that the target data becomes cleaner and can be the basis for other research, one of which is about single object tracking with cameras connected in one city.

GRATITUDE

Thank Telkom University for providing a place for authors to create this journal research. Hopefully, this research will significantly impact the development of technology.

REFERENCES

[1] Døble, Eirik & Haugseter, Sindre & Mikkelsen, Christian & Sneisen, Jørgen & Skeie, Nils-Olav & Brastein, Ole. (2022).

Level Measurements with Computer vision - Comparison of traditional and modern Computer vision Methods. 140-147.

10.3384/ecp21185140.

[2] Faraz Lotfi, & Hamid D. Taghirad. (2021). Single Object Tracking through a Fast and Effective Single-Multiple Model Convolutional Neural Network.

[3] Cheewaprakobkit, Pimpa & Shih, Timothy & Lin, Chih-Yang & Liao, Hung-Chun. (2022). A Novel Relational Deep Network for Single Object Tracking. 102-107. 10.1109/KST53302.2022.9729070.

[4] Zhongdao Wang, Liang Zheng, Yixuan Liu, Yali Li, & Shengjin Wang. (2020). Towards Real-Time Multi-Object Tracking.

[5] Shuai, Bing & Berneshawi, Andrew & Li, Xinyu & Modolo, Davide & Tighe, Joseph. (2021). SiamMOT: Siamese Multi-Object Tracking. 12367-12377. 10.1109/CVPR46437.2021.01219.

[6] Ilkin, Sumeyya & Gulagiz, Fidan & Akcakaya, Merve & Sahin, Suhap. (2022). Embedded Visual Object Tracking System Based on CSRT Tracker. 1-4. 10.1109/ICEIC54506.2022.9748840.

[7] Teng, Fei & Liu, Qing & Zhu, Lin & Gao, Xueying. (2014). Robust multi-scale ship tracking via extended MIL tracker.

177-182. 10.2495/ICEEE20140211.

[8] Xiaofeng, qiu & xiangrui, sun & yongchang, chen & Xinyan, Wang. (2021). Pedestrian detection and counting method based on YOLOv4+DeepSORT. 2. 10.1117/12.2618209.

[9] Lev V. Utkin, Maxim S. Kovalev, & Ernest M. Kasimov. (2019). An explanation method for Siamese neural networks.

[10] Paliwal, Pinak. (2021). Siamese Networks for Image Comparison and Discrepancy Localization: Siamese Networks in

(9)

Exam Proctoring. 323-329. 10.1145/3488933.3488987.

[11] Miao, Yilin & Liu, Zhewei & Wu, Xiangning & Gao, Jie. (2021). Cost-Sensitive Siamese Network for PCB Defect Classification. Computational Intelligence and Neuroscience. 2021. 1-13. 10.1155/2021/7550670.

[12] Li, Da & Kang, Yabing & Xiang, Xing & Tao, WenSheng & Hu, Jiwei. (2022). Siamese Network with Feature Fusion for Visual Tracking. 1048-1052. 10.1109/ITOEC53115.2022.9734615.

[13] Huang, Hanlin & Liu, Guixi & Zhang, Yi & Xiong, Ruke & Zhang, Shaoxuan. (2022). Ensemble siamese networks for object tracking. Neural Computing and Applications. 34. 1-19. 10.1007/s00521-022-06911-4.

[14] Li, B., Yan, J., Wu, W., Zhu, Z., & Hu, X. (2018). High Performance Visual Tracking with Siamese Region Proposal Network. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8971-8980).

[15] Zhou, Wei & Liu, Yuxiang & Xu, Haixia & Hu, Zhihai. (2022). A Modified SiamRPN for Visual Tracking. 10.1007/978- 981-16-6963-7_70.

[16] Pinky Sitikhu, Kritish Pahi, Pujan Thapa, & Subarna Shakya. (2019). A Comparison of Semantic Similarity Methods for Maximum Human Interpretability.

[17] Rachevsky, Leonid & Kanevsky, Dimitri & Sarikaya, Ruhi & Ramabhadran, Bhuvana. (2011). Clustering with Modified Cosine Distance Learned from Constraints.. 1313-1316. 10.21437/Interspeech.2011-437.

[18] Karen Simonyan, & Andrew Zisserman. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition.

[19] Sheriff, S & Kumar J, Venkat & Vigneshwaran, S & Jones, Aida & Anand, Jose. (2021). Lung Cancer Detection using VGG NET 16 Architecture. Journal of Physics: Conference Series. 2040. 012001. 10.1088/1742-6596/2040/1/012001.

[20] Cao, Ying & Gu, Runlong & Huang, Chenghua. (2022). Research on image recognition method based on LMAL and VGG-16. 98. 10.1117/12.2637770.

[21] Taluja, Anuradha & Singhal, Jay & Gulati, Aastha & Gupta, Harshit. (2022). Gatekeeper Security Check System Using VGG. 10.1007/978-981-16-4016-2_66.

[22] Yang, Zhen & Xu, Xuefei & Wang, Keke & Li, Xin & Ma, Chi. (2021). Multitarget Detection of Transmission Lines Based on DANet and YOLOv4. Scientific Programming. 2021. 1-12. 10.1155/2021/6235452.

[23] Sathyamurthy, Kavi & Rajmohan, A.R. & Tejaswar, A. & Velayutham, Kavitha & Manimala, G.. (2021). Realtime Face Mask Detection Using TINY-YOLO V4. 169-174. 10.1109/ICCCT53315.2021.9711838.

[24] Wani, L. & Momin, Md & Bhosale, Sharwari & Yadav, Abhishek & Nili, Manas. (2022). Vehicle Crash Detection using YOLO Algorithm. International Journal of Computer Science and Mobile Computing. 11. 75-82.

10.47760/ijcsmc.2022.v11i05.007.

[25] Lisangan, Erick & Gormantara, Alfredo & Carolus, Ridnaldy. (2022). Implementasi Naive Bayes pada Analisis Sentimen Opini Masyarakat di Twitter Terhadap Kondisi New Normal di Indonesia. KONSTELASI: Konvergensi Teknologi dan Sistem Informasi. 2. 10.24002/konstelasi.v2i1.5609.

[26] Romli, Ikhsan & Kharida, Fairuz & Naya, Chandra. (2020). Determination of Customer Satisfaction of Tax Service Office Services Using C4.5 and PSO. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi). 4. 296-302.

10.29207/resti.v4i2.1718.

[27] Wu, Yi & Lim, Jongwoo & Yang, Ming-Hsuan. (2015). Object Tracking Benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence. 37. 1-1. 10.1109/TPAMI.2014.2388226.

[28] Wang, Xiao & Shu, Xiujun & Zhang, Zhipeng & Jiang, Bo & Wang, Yaowei & Tian, Yonghong & Wu, Feng. (2021).

Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark.