UAV-deployed deep learning network for real-time multi-class damage detection using model quantization techniques

(1)

Automation in Construction 159 (2024) 105254

Available online 30 December 2023

UAV-deployed deep learning network for real-time multi-class damage detection using model quantization techniques

Xiaofei Yang

^*

, Enrique del Rey Castillo , Yang Zou , Liam Wotherspoon

Department of Civil and Environmental Engineering, University of Auckland, Auckland 1023, New Zealand

A R T I C L E I N F O Keywords:

Energy-efficient deep learning network Real-time damage detection Concrete bridge

Quantization-aware training Computer vision

A B S T R A C T

Real-time damage detection algorithms deployed on Unmanned Aerial Vehicles (UAVs) can support flight control in real time, enabling the capture of higher quality inspection data. However, three challenges have hindered their wider application: 1) Existing anchor-based damage detectors cannot generalize well to real-world scenarios and degrade the detection speed; 2) Prior studies exhibit a low detection accuracy; 3) No previous study considers the energy consumption issue of the damage detector, limiting the UAVs’ flight time. To meet these challenges, this paper presented the YOLOv6s-GRE-quantized method, which is an energy-efficient anchor free and real-time damage detection method built on top of the YOLOv6s algorithm. Firstly, the YOLOv6s-GRE method was presented, where a generalized feature pyramid network (GFPN), a reparameterization efficient layer aggregation network (RepELAN) and an efficient detection head were introduced into the YOLOv6s.

Comparison experiments showed that the YOLOv6s-GRE method, in contrast to YOLOv6s, advanced 2.3 percentage points in the metric of mAP50, while maintaining comparable detection speed and without requiring an increase in model size. The YOLOv6s-GRE model was then reconstructed by the RepOptimizer (RepOpt) to equivalently transform the YOLOv6s-GRE into a quantization-friendly model for addressing the quantization difficulty of the reparameterization model. Finally, the YOLOv6s-GRE model with RepOpt was quantized by the partial quantization-aware training technique, expediting the detection speed by 83.5% and saving energy by 79.7% while still maintaining a comparable level of detection accuracy. Implementing of this proposed method can significantly boost bridge inspection productivity.

1. Introduction

Bridges form the backbone of the modern transportation network but are susceptible to deterioration and corrosion due to adverse environmental conditions and increasing traffic loads [6]. The excessive deterioration of bridges may lead to traffic restrictions and temporary closures, significantly affecting a country’s economy [40]. Proper bridge condition monitoring is critical to the safe use of in-service bridges.

Among the various condition monitoring strategies, visual inspection is the most frequent and cost-effective, aiming to collect “relevant data and describe defects in terms of their type, location, extent, severity and, if possible, cause” [3]. Nevertheless, conventional bridge visual inspection has been recognized as time-consuming, error prone and sometimes dangerous [48]. Thus, new techniques with a more efficient and objective inspection decision process are in high demand given that the bridge stock is ageing, and the infrastructure maintenance budget is decreasing.

Unmanned aerial vehicles (UAVs) have recently seen an increased use for bridge visual inspection due to their excellent maneuverability and a wide coverage of the inspection field of view [32,44]. According to a study from the Florida Department of Transportation, over half of bridges can be inspected by UAVs [34], saving up to 60% of inspection costs [46]. Existing UAV-assisted bridge inspection processes rely on fixed-view-point navigation or manual operation multiple meters away from the bridge to capture images (e.g. 5.5 m of working distance with a Ground Sampling Distance (GSD) of 1.5 mm/pixel [43]) to provide a wide coverage of viewpoints and avoid collision. However, this does not meet the bridge health monitoring standards that require sub-millimeter level of damage measurement accuracy. It is also impractical to imple- ment a closer-proximity inspection for a long time. Because this will decrease the UAV coverage of viewpoints, significantly lowering the inspection efficiency and increasing the flight time while compromising flight safety. A possible solution is to deploy real-time damage detection algorithms on the UAV empowering it with damage awareness,

* Corresponding author.

E-mail address: [email protected] (X. Yang).

Contents lists available at ScienceDirect

Automation in Construction

journal homepage: www.elsevier.com/locate/autcon

https://doi.org/10.1016/j.autcon.2023.105254

Received 10 June 2023; Received in revised form 20 December 2023; Accepted 21 December 2023

(2)

Automation in Construction 159 (2024) 105254 precision position control and close-proximity inspection. The idea is

that the UAV approaches the vicinity of the damaged area with the assist of immediate flight control algorithms only when surface defects are detected given that healthy concrete areas comprise over 80–90% of the total bridge appearance area [1]. This leads to a damage-aware coverage of viewpoints to capture high resolution damage data with sub- millimeter level of finer details, while ensuring the flight safety. It is worth noting that previous work leveraging the ground server to process the UAV-captured videos is not appropriate to support a real-time flight control because it may encounter delays caused by video encoding and streaming, as well as inevitable interruptions in the video stream caused by connection issues, obstacles, and weather conditions, increasing the response time [4,45]. The key of UAV-assisted real-time damage detection is to develop a light-weight, energy efficient and real-time damage detection algorithm that is applicable to be deployed on UAVs to support higher quality inspection data acquisition. However, most existing damage detection algorithms are not appropriate for the deployment on UAVs as existing real-time damage detection models require expensive computational resources, high memory, a large storage footprint and high energy consumption [21].

Existing real-time damage detection methods mainly leveraged light- weight networks or tiny versions of deep learning models with small model depths and widths to reduce the model parameters and computational burden. An early example proposed by Jiang et al. [22] introduced a light-weight backbone called MobileNetV3 [25] into the original You Only Look Once version 3 (YOLOv3) algorithm [14] to improve damage detection accuracy and inference speed. The recently developed YOLOv5s-HSC network employed a combination of the YOLOv5s algorithm and Swin Transformer modules as well as attention modules to conduct real-time damage detection tasks [51]. Some limitations remain, despite the extensive work and significant improvements of the last few years, as listed below:

1) Previous studies rely on anchor-based methods that spend a large amount of time on anchor relevant computation, limiting inference speed and the ability to generalize to real-world scenarios.

2) The detection accuracy of existing real-time damage detection methods is still low due to the reliance on light-weight networks or tiny versions of deep learning models.

3) None of the existing studies considered the energy consumption of deep learning networks, which is a critical factor for their deployment on UAVs that are often constrained by their battery capacity.

This study presents the development of an energy-efficient anchor free and real-time damage detection method using a model quantization technique, the YOLOv6s-GRE-quantized method, appropriate for deployment on UAVs. The YOLOv6s algorithm [27] is the current most advanced anchor free method and was selected as the baseline model because of its high detection accuracy and fast inference speed, as well as small number of parameters and low floating-point operations per second (FLOPs) [27]. A generalized feature pyramid network (GFPN) [23,47] was firstly introduced into the YOLOv6s algorithm as the neck network to boost information exchange across distinct spatial scales and different levels of potential semantics concurrently. Along with the GFPN, a reparameterization-based efficient layer aggregation network (RepELAN) was proposed to replace the original Reparameterization Block (RepBlock) to reduce the model parameters and computing resources while improving the feature fusion ability and the inference speed. In addition, an efficient detection head was incorporated to decrease the model parameters. Finally, the model quantization technique was leveraged on the YOLOv6s-GRE method to reduce computational complexity, memory, storage footprint, and the energy usage of the onboard computer as well as increase the computation speed.

The main contributions of this study are summarized as follows:

1) An improved real-time anchor free damage detection method called YOLOv6s-GRE was developed on top of the YOLOv6s algorithm to better generalize to real-world scenarios with higher detection accuracy.

2) The YOLOv6s-GRE model was reconstructed with RepOptimizer (RepOpt) [12] to equivalently transform the YOLOv6s-GRE model to a quantization-friendly model for addressing the quantization difficulty of the reparameterization model. The YOLOv6s-GRE model with RepOpt was quantized using a partial quantization-aware training technique to obtain a memory and storage footprint saving and energy-efficient network that is appropriate for deployment on UAVs.

The structure of the paper is organized as follows. State-of-the-art research is discussed in Section 2. The methodology of the paper is introduced in Section 3. Section 4 elaborates on the experiment imple- ment, evaluation and analysis. Section 5 concludes the paper and dis- cusses potential future research. Code would be available at https://gith ub.com/Xiaofei-Kevin-Yang/YOLOv6-GRE-Quantized.

2. Literature review

The combination of UAVs and automated damage detection algorithms has the potential to significantly improve the efficiency of the bridge visual inspection process from data collection and analysis to auxiliary decision-making. Currently, automated damage detection methods can be divided into two groups: post-flight and real-time damage detection [49].

2.1. Post-flight damage detection

Post-flight damage detection, also called offline detection, refers to the detection of damage after the data collection task. An extensive number of studies have focused on the field of post-flight damage detection. An early example leveraged Faster Region-based Convolu- tional Neural Networks (Faster R-CNNs) to automatically detect five damage types [8]. The experiment results demonstrated that the deep learning technique was superior compared to traditional image pro- cessing techniques and machine learning methods. Faster R-CNNs are a typical two-stage damage detector that can achieve a high detection accuracy but have a relatively low detection speed. To accelerate the detection speed, YOLOv3 has been investigated [22,48], indicating a three to five times faster detection speed than Faster R-CNNs while slightly compromising the overall detection accuracy. To improve the detection accuracy of YOLO series, YOLOv4 [7] was presented on top of YOLOv3 algorithm using more tuning methods such as mosaic data augmentation, and the Path Aggregation Network (PAN) was introduced into the network for better multi-level feature aggregation. An example of the use of YOLOv4 to perform automated damage detection was presented by Zou et al. [54], where depth-wise separable convolution blocks were added to improve the YOLOv4 network, leading to less computational cost. YOLOv5 was developed to further improve the detection accuracy. A recent example presented an improved YOLOv5 algorithm to conduct bridge surface defect detection, integrating a convolutional block attention module, a decoupled prediction head, and a focal loss function into the original network [37].

As the aforementioned YOLO algorithms are anchor-based methods, their detection performance is dependent on the selection of preset anchor box sizes. As the calculation of preset anchor box sizes is dataset- specific, anchor-based methods cannot generalize well to the real- world scenarios. To address this issue, He et al. presented a novel anchor-free method called CenWholeNet [17], which is an improved version of CenterNet [52]. It predicted the center point, the diagonal length and angle of the defect bounding box, considering both central information and whole information. In addition, a parallel attention module was also introduced into the network to further increase the X. Yang et al.

(3)

detection accuracy.

Most existing studies in the realm of automated damage detection focused on post-flight damage detection. Deep learning networks for post-flight detection are normally based on a large number of parameters, requiring heavy computational resources and expensive energy cost as well as a long detection time. Thus, it is challenging to deploy these post-flight algorithms directly on UAVs to perform real-time damage detection.

2.2. Real-time damage detection

Real-time damage detection points to the detection of damage areas synchronously during the UAV data collection process. The real-time detection results can be used for guiding UAVs to fly closer to the damaged area to capture higher resolution damage with finer details. An early example leveraged the UAV to capture the video that was wire- lessly streamed to a ground server for detecting defects. But this process can result in significant delays by video encoding and streaming, as well as inevitable interruptions and cuts in the video stream caused by connection issues, obstacles, and weather conditions. In addition, video streaming transmission has high-bandwidth requirements, further increasing the response time. A modified Faster R-CNN was then presented to automatically detect multiple defects from video frames [5].

The experiment results demonstrated that the presented method was a quasi-real-time detection method and showed superior performance in small and blurry defects. While this method achieved a real-time detection speed, it could not be deployed on the UAVs due to the large

model parameters. To meet this challenge, Kumar et al. [26] presented a real-time multi-UAV damage detection system, where the YOLOv3 algorithm was deployed on the Jetson-TX2 [38] onboard computer of a Pixhawk’s hardware standards-based hexacopter. Although this method achieved real-time detection and deployment on the UAVs, its high computational complexity limited the onboard application. Efficient neural network architecture design presents high potential to reduce parameters and the computational complexity. Introducing efficient neural network into existing methods could be a good solution to improve computational efficiency. A recent example proposed by Jiang et al. [21] used a lightweight backbone called MobileNetv2 [36] to replace the original backbone of the YOLOv3 algorithm. The proposed method with efficient backbone design significantly reduced the computational burden. Nevertheless, the lightweight MobileNetv2, which leveraged depth-wise convolutions to reduce the model size, increased the memory access cost, thus decreasing the detection speed.

In addition, a pruning algorithm [34] was also developed to decrease the number of unimportant deep network parameters. Recent work proposed a novel deep learning network called YOLOv4-FPM [32] on top of the original YOLOv4 [21] algorithm to achieve real-time concrete bridge crack detection. This method firstly leveraged focal loss function [33] to alleviate the imbalance problem between positive and negative samples, thus improving its detection accuracy for images with complex back- ground information. A pruning algorithm [30] was then employed to remove unimportant nodes and parameters in the deep learning model, reducing the model size and complexity as well as expediting the detection speed. The limitation of this method is that the YOLOv4-FPM Fig. 1.Overall architecture of the original YOLOv6s model. Numbers indicate the input size of each layer within the model.

(4)

Automation in Construction 159 (2024) 105254

algorithm is an anchor-based method that cannot generalize well in real- world scenarios.

Starting from YOLOv5, researchers designed multiple versions with different model sizes by controlling the model depths and widths for application in different devices. They normally employed the small version models to perform the real-time detection task. For example, Zhao et al. [51] proposed a YOLOv5s-HSC method on top of the small version of the original YOLOv5s model, where Swin transformer blocks [31] and coordinate attention modules [19] were added to further improve the damage detection accuracy. An additional detection head was introduced into the network to alleviate the issue of defect scale variation, however this significantly increased the number of model parameters and computational burden. A light-weight and improved YOLOv5s called YOLOv5s-GTB was developed by Xiao et al. [35], where light-weight GhostNet [15] and Bi-directional Feature Pyramid Network (BiFPN) [39] were leveraged as the backbone and neck network respectively. A transformer multi-headed self-attention mechanism was also introduced into the proposed network. The experiment results showed that the proposed method not only reduced the 42% number of parameters and had a faster detection speed but also achieved a better detection accuracy compared to the original YOLOv5s algorithm.

To summarize, while existing real-time damage detection algorithms achieved considerable success, they still face several challenges. Firstly, existing studies were mainly developed on top of anchor-based approaches that cannot generalize well to real-world scenarios. Secondly, real-time damage detection algorithms exhibit a low detection accuracy because they are based on light-weight and small versions of deep learning models. In addition, we also found that none of the existing studies considered the energy consumption of deep learning networks which is critical to energy-constraint UAVs.

3. Methodology

This section firstly expatiates the overall architecture of the YOLOv6s algorithm from backbone module and neck module to head module to help audience understand the main components of YOLOv6s algorithm.

Secondly, three improvements i.e., GFPN neck, RepELAN block and an efficient detection head are illustrated in detail to better tradeoff between damage detection accuracy and speed. Finally, the principle of the model quantization technique used in this study is described to improve the energy efficiency of the proposed YOLOv6s-GRE method.

3.1. Overview of YOLOv6s architecture

The original YOLOv6s [27] network is a small version model designed for a mobile platform. The Reparameterization Visual Geom- etry Group (RepVGG) [13] network was used as the backbone module to extract damage features from images. The RepVGG decoupled the training and inference architectures with a structural reparameterization technique, significantly improving its feature representation ability.

Subsequently, the Reparameterization Path Aggregation Network (Rep- PAN) [27] was leveraged as the neck module to perform multi-scale feature aggregation, which is an enhanced PANet [29] established by RepBlocks. An efficient decoupled head was designed for damage classification and localization. The overall architecture of the original YOLOv6s model is presented in Fig. 1. The details of the main blocks in each module are presented in the following sections.

3.1.1. Backbone module

The backbone module of the YOLOv6s model in Fig. 1 leveraged a structural re-parameterization technique to develop an efficient feature representation network denoted as EfficientRep. The backbone module contains five stages, and each stage starts with a down-sample layer via Stride-2 convolutional operation. The main components of the backbone module are comprised of RepVGG Blocks, RepBlocks and a SimSPPF block.

The RepVGG [13] block is designed using a reparameterization structure, decoupling the training-phase multi-branch and inference- phase plain architecture, resulting in a better trade-off between accuracy and computational efficiency. The architecture of the RepVGG block is illustrated in Fig. 2. The core of the structural reparameterization technique is the equivalent conversion of a certain network architecture to another network by transforming its parameters. To be specific, during the training phase, the RepVGG block is constructed using 3 ×3 convolution (3 ×3 Conv2d), 1 ×1 convolution (1 ×1 Conv2d) and identity branches if the stride is set to 1. The input is firstly processed by convolutional layers with kernels of 3 ×3, 1 ×1 and identity respectively, followed by the batch normalization (BN) layer.

After the normalization, the outputs are combined together via an element-wise add operation and passed through the Rectified Linear Units (ReLU) activation function. The RepVGG block has two branches with convolutional kernels of 3 ×3 and 1 ×1 respectively if the stride is set to 2. It is worth noting that the stride is set to 2 when the RepVGG block is used as a standalone block, while the stride is adjusted to 1 when Fig. 2. Architecture of the RepVGG Block during (a) training phase and (b) inference phase.

X. Yang et al.

(5)

the RepVGG block is integrated within the RepBlock framework.

The transformation of the parameters of the RepVGG block is performed after training, where the well-trained parameters of convolutional layers with kernels of 3 ×3, 1 ×1 and identity as well as batch normalization layers are leveraged to construct a single convolution layer with a 3 ×3 kernel. During the inference phase, the RepVGG block is built using this 3 ×3 convolution layer and a ReLU activation. The RepBlock is constructed by a stack of RepVGG blocks. The architecture of the RepBlock in the training and inference phases are shown in Fig. 3.

The Simplified Spatial Pyramid Pooling Fast (SimSPPF) block aims to efficiently combine features at different scales to extend the receptive field. The architecture of the SimSPPF block is presented in Fig. 4. The input is firstly processed by the SimConv block that contains Sigmoid Linear Unit (SiLU), followed by leveraging three maxpooling operations (MaxPool) to extract features at different scales. Subsequently, features of different scales are concatenated together (Concatenate), and then operated by the SimConv block. SimSPPF reduces the computational cost and memory usage of the feature fusion process by using maxpooling instead of the convolutional operation, expediting the detection speed of the YOLOv6 model.

3.1.2. Neck module

The neck module in the YOLOv6s model aims to refine and consol- idate the feature maps extracted from the backbone module. The architecture of the neck module is shown in Fig. 1. The neck module leverages RepBlock to reconstruct the Path Aggregation Network (PANet) [29] used in YOLOv4 [7] and YOLOv5 [41], denoted as Rep- arameterization PANet (Rep-PAN). The Rep-PAN combines different feature maps with distinct resolutions through lateral connections, top- down connections and bottom-up connections. Lateral connections are leveraged to connect features maps of different scales to enable information be shared at different levels. The top-down pathway enhances the semantic feature representation by up-sampling (UpSample) high level semantic feature maps via ConvTranspose2d operation and merging them with lower-level feature maps using lateral connections.

The bottom-up pathway further improves the localization feature representation by iteratively down-sampling the feature maps and fusing them with the low-level localization features through lateral

connections and skip connections, which also shortens the information path.

3.1.3. Head module

The YOLOv6s model consists of three detection heads to predict small, medium and large-scale defects. The details of the head module and the relevant blocks are shown in Fig. 1. These detection heads adopt a decoupled head design to perform damage classification and box regression tasks separately. The aggregated feature maps generated from the neck module are firstly processed by a 1 ×1 Conv block, followed by the operation of two parallel branches with a single 3 ×3 Conv block and a 1 ×1 convolutional layer (Conv2d) respectively. The outputs of these two branches are then concatenated together to predict the coordinate values of bounding boxes and the corresponding class probabilities.

3.2. Overview of YOLOv6s-GRE architecture

The overall architecture of the YOLOv6s-GRE method is illustrated in Fig. 5. Three improvements were introduced into the original YOLOv6s model. First, the GFPN Neck module (as presented in Fig. 5) was introduced to deepen the neck module and achieved sufficient multi- scale feature fusion inspired by the idea that the heavy neck paradigm is more suitable for damage detection tasks [23]. Secondly, a Repar- ameterization Efficient Layer Aggregation Network (RepELAN) block replaced the original RepBlock in the GFPN Neck module. This acted as the feature fusion block, enhancing the feature learning ability and lowering the computational resource requirements. Finally, an efficient detection head module (Efficient Head in Fig. 5) was included to reduce the model size with a negligible reduction in the detection accuracy.

Details of each improvement are elaborated in the following subsections.

3.2.1. Generalized feature pyramid network (GFPN)

This study adopts a deeper and larger neck module called GFPN as previous work demonstrated that a heavy neck design can perform better than a heavy backbone design in damage detection tasks [23].

The architecture of the GFPN neck module is presented in Fig. 5. The GFPN firstly aggregated different levels of features extracted from the backbone module with skip layer and cross-scale connections followed by a top-down pathway to enhance the high-level semantics with multi- scale low level spatial information, enabling more effective information transmission from the early layer to the later layer. Subsequently, the down-sampled feature maps are fused with previous high-level semantics following a bottom-up pathway. The GFPN is able to achieve sufficient information exchange across high level semantic information and low-level spatial information.

3.2.2. Reparameterization efficient layer aggregation network (RepELAN) Efficient layer aggregation network (ELAN) [42] follows a gradient path design strategy, which can simultaneously improve the network learning ability and inference speed. The architecture of the ELAN block is presented in Fig. 6. The ELAN block allows flexibility when setting the number of convolutional block stacks to strike a trade-off between accuracy and computational efficiency. In addition, previous research [13]

has shown that the RepVGG block performed better than the convolutional blocks.

Inspired by both advantages, we presented a RepELAN block to Fig. 3. Architecture of the RepBlock during (a) training phase and (b) infer-

ence phase.

Fig. 4. Architecture of the SimSPPF block.

(6)

reconstruct the ELAN block with RepVGG blocks to further improve the network detection performance. The architecture of the proposed RepELAN block is illustrated in Fig. 7. Specifically, we firstly constructed a BottleRep block using the RepVGG block with a residual

connection, as shown in Fig. 8. Subsequently, BottleRep blocks were used to replace the original convolutional block stacks, and the number of BottleRep block stacks was select as 3 considering the trade-off between accuracy and computational efficiency. In addition, 1 × 1 Fig. 5. Overall architecture of the YOLOv6s-GRE method on top of YOLOv6s model. Numbers represent the input size of each layer within the YOLOv6s- GRE method.

Fig. 6.Architecture of the ELAN block.

Fig. 7. Architecture of the proposed RepELAN block.

X. Yang et al.

(7)

SimConv blocks were also used to substitute the original 1 ×1 Conv blocks to accelerate the detection speed. As can be seen from Fig. 7, The RepELAN block has two branches, where the first branch is processed by a 1 ×1 SimConv block to change the number of channels, while the second branch contains a 1 × 1 SimConv block to change channel numbers and a stack of 3 BottleRep blocks to extract features. The output of the first branch and the output of each BottleRep block are finally concatenated together, followed by a 1 ×1 SimConv operation.

3.2.3. Efficient detection head

To better trade-off the detection accuracy and speed as well as reduce the model size in the level of algorithm, we presented an efficient decoupled detection head to cancel the original middle 3 ×3 convolutional layers, leaving a 1 ×1 Conv block and two task projection layers (i.e., one linear layer for classification and one linear layer for regression). The architecture of the proposed detection head is presented in Fig. 5.

3.3. Model quantization technique

Real-time damage detection algorithms have seen increased use on mobile platforms, however the high energy consumption, computational expense and memory usage present bottlenecks when deploying these algorithms on UAVs [53]. Model quantization techniques aim to store and compute model weight parameters with lower bit-width representation such as using 8-bit Integers (INT8) precision to replace the typical 32-bit Floating Point (FP32) precision [16]. The INT8 quantization allows for a 4 times reduction in the model size and a 4 times reduction in memory bandwidth requirements compared to the typical FP32 arithmetic [20]. Hardware supported by INT8 precision is typically 2 to 4 times faster than FP32 precision [2]. In addition, INT8 multiplication arithmetic consumes 18.5 times less energy than FP32 multiplication arithmetic [18]. Therefore, the model quantization technique can not only reduce memory footprint but also energy consumption, keeping comparative levels of accuracy as compared to full precision (FP32) and making the quantized model affordable for UAVs. Nevertheless, prior model quantization techniques are not effective for the YOLOv6s-GRE method due to the extensive use of reparameterization blocks, resulting in the amplified standard deviation of weight parameter distribution. This study firstly leveraged the RepOptimizer (RepOpt) [12]

method to equivalently transform the YOLOv6s-GRE model to a quantization-friendly model for addressing the quantization challenge of reparameterization blocks. Subsequently, sensitivity analysis was performed to identify quantization-sensitive layers and then the quantization-sensitive layers were converted into full precision arithmetic as a compromise. Lastly, partially quantization-aware training (QAT) was conducted on the YOLOv6s-GRE model with RepOpt. Details are explained in the following subsections.

3.3.1. Re-parameterizing optimizer

The YOLOv6s-GRE method has heavily employed RepVGG blocks (as shown in Fig. 2) [13] in the network architecture due to their better tradeoffs between detection accuracy and inference speed. The RepVGG block incorporates model-specific prior knowledge using multiple branches during the training phase while merging the multi-branch architecture into the plain architecture with a single 3 ×3 convolutional layer during the inference phase. However, these reparameterization- based blocks face quantization difficulty because of the gain of dynamic numerical range caused by RepVGG blocks’ intrinsic multi-

branch design. For example, performance degradation of greater than 20% on ImageNet [11] has been observed after a standard post-training quantization [12]. To meet this challenge, RepOpt-VGG [12] was proposed to develop a two-stage optimization pipeline [10]. Within the RepOpt-VGG, a RealVGG block was leveraged (as shown in Fig. 9) to replace the original RepVGG block during the training phase and introduce the model-specific prior knowledge into the model optimizer, which is achieved by adjusting the gradients based on model-specific hyper-parameters. This technique is known as Gradient Re-parameterization, and the resulting optimizers are called RepOptimizers.

Inspired by the RepOpt-VGG network, we firstly replaced the RepVGG blocks with RealVGG blocks in the YOLOv6s-GRE model and then trained the YOLOv6s-GRE method with the RepOpt to obtain quantization-friendly weights.

Fig. 8.Architecture of BottleRep block.

Fig. 9.Architecture of RealVGG Block.

(8)

3.3.2. Analysis of quantization sensitivity

Quantization sensitivity analysis aims to quantify the neural network’s degree of sensitivity to changes in the precision of its weights and activations. The mean average precision (mAP) was calculated for each layer contained in the YOLOv6s-GRE method trained by RepOpt with and without quantization to obtain the sensitivity distribution. The mAP differences with and without quantization were leveraged as the evaluation metric to measure the quantization errors.

3.3.3. Partial quantization-aware training

The core of the INT8 precision quantization is to map the FP32 precision floating point value xf∈ (α,β)to INT8 precision quantization value xq∈(

α_q,β_q)

. This study adopted the partial quantization-aware training method to quantize the YOLOv6s-GRE model with RepOpt, leading to a negligible compromise in accuracy. Specifically, the most sensitive layers were firstly assigned with full precision according to the sensitivity distribution obtained from the quantization sensitivity analysis. Secondly, the YOLOv6s-GRE with RepOpt was trained using FP32 full precision. The FakeQuantize module was then inserted before the convolutional operations and after the ReLU operations within the non- sensitive layers. The insertion of FakeQuantize module into the network is illustrated in Fig. 10.

The model with FakeQuantize module is denoted as QAT model. The FakeQuantize module is comprised of a quantization operation (Quan- tize) and a de-quantization operation (De-Quantize) in a sequential manner, as shown in Fig. 11.

The quantization operation is performed according to the quantization function below:

xq=fq

(xf,s,z)

=clip (

round (xf

s+z )

,αq,β_q

) (1)

Where xq represents the INT8 precision quantization tensor and xf

denotes the FP32 precision floating point tensor. α and β stand for the minimum value and maximum value of FP32 precision floating point tensor. α_qand βq point to the minimum value and maximum value of INT8 precision quantization tensor.s=_β^β−^α

q−α_qdenotes the scale factor, z= round(

βα_q−αβ_q β−α

) represents the zero point, and the clip function is computed as below.

clip( xf,αq,β_q)

=

⎧

⎪⎪

⎪⎨

⎪⎪

⎪⎩

αq xf <αq

xf αq≤xf≤β_q β_q xf>β_q

(2)

The de-quantization function is defined as follows:

xd=fd

(xq,s,z)

=s( xq− z)

(3) Here note that the FakeQuantize operation will result in information loss because the floating-point values after quantization and de- quantization are not completely recoverable due to the clip and round operations within the quantization function. The information loss Δis computed as below:

Δ=xf− fd

(fq

(xf,s,z) ,s,z)

(4) The information loss could further result in the network accuracy degradation after quantization operation. Therefore, the information loss caused by the FakeQuantize operation was added into the overall loss function, and the pre-trained FP32 full precision model weights Fig. 10.Illustration of FakeQuantize insertion process.

Fig. 11.Architecture of FakeQuantize module.

X. Yang et al.

(9)

were leveraged to initialize the QAT model. In addition, a self- distillation approach and graph optimization [27] were also leveraged to further improve the quantization accuracy and speed. Finally, the QAT model was finetuned and the quantization parameters were saved accordingly.

4. Experiment and results 4.1. Data preparation

This study employed our previously built concrete bridge damage dataset, which was labelled by a group of researchers with a consistent standard using the image annotation tool LabelImg [28]. The dataset comprises 1969 images with 4385 annotations, covering three common defect types: spalling (1359 annotations), exposed rebar (1950 annotations) and efflorescence (1076 annotations) considering the availability of the damage dataset and prior extensive studies on other defects like cracks. The dataset was then partitioned into a training dataset, a validation dataset and a testing dataset in a 7:1:2 ratio. The training dataset, validation dataset and testing dataset respectively contain 1377 images, 197 images and 395 images. Mosaic [7,41] and Mixup [50] were employed as strong data augmentation methods to enhance the data diversity. Some examples of defect images contained in the dataset are presented in Fig. 12.

4.2. Implementation details

The Linux operating system along with an Intel(R) Xeon(R) CPU E5–2680 v4 CPU and a single NVIDIA RTX3090 GPU with 24GB memory were used to train and test the proposed real-time damage detection method. The Pytorch 1.9.0 version [1] was selected as the deep learning

library, and Python 3.8.5 version was used as the programming lan- guage to develop the proposed method. During the training phase, the image input size was set to 640 pixels ×640 pixels. Batch size and the training epoch were set to 32 and 100 respectively. In addition, the stochastic gradient descent (SGD) [24] was chosen as the optimizer with momentum of 0.937. The initial learning rate was equal to 0.01, and the learning rate decay followed a cosine schedule. The first 3 epochs were used for warm-up. During the testing, the mean average precision (mAP) and the frames per second (FPS) were used as the evaluation metrics for detection accuracy and speed respectively. The amount of model parameters and floating-point operations per second (FLOPS) were also computed to measure the model size and the computational complexity.

The TensorRT version 8.2.3.0 was also selected for implementation of the quantization operation. In terms of the evaluation metric of the energy consumption, Joule (J) was leveraged, which is the product of the power consumption and time. Here note that the evaluation experiment of the energy consumption of damage detection algorithms was conducted on the NVIDIA RTX3090 GPU with 24GB memory to demonstrate the viability of employing the proposed method for energy conservation.

4.3. Evaluation

The effectiveness of the YOLOv6s-GRE method was firstly evaluated through comparison experiments. The effect of each improvement in the YOLOv6s-GRE method was also analyzed via the ablation study [33].

Finally, the evaluation of quantization results was carefully performed to validate the advantages of the quantized model of the YOLOv6s-GRE method. Details are illustrated in the following subsections.

Fig. 12.Examples of different types of annotated defect images in the concrete bridge damage dataset.

(10)

Fig. 13.Some comparison examples of prediction results generated by (a) the original YOLOv6s and (b) the YOLOv6s-GRE method.

X. Yang et al.

(11)

4.3.1. Comparison experiments with YOLOv6s

Detailed qualitative and quantitative comparison experiments were firstly conducted between the original YOLOv6s method and the YOLOv6s-GRE method. Prediction results of the YOLOv6s-GRE method were firstly qualitatively compared with the original YOLOv6s method.

Some examples of the qualitative comparison between these two methods were presented in Fig. 13.

The prediction results generated by the original YOLOv6s can miss the detection of some defects, as can be seen from Fig. 13. A possible reason for the missed damage detection is that the surface of the exposed rebar is covered by some concrete dust, resulting in a similar feature representation between the exposed rebar and concrete surface. In contrast, the YOLOv6s-GRE method exhibits stronger detection ability, resulting in higher confidence scores and can generate more accurate prediction results compared with the original YOLOv6s. In addition to the qualitative evaluation, we also quantitatively compared the YOLOv6s-GRE method with the original YOLOv6s method in terms of overall detection accuracy and the accuracy of each defect type. Com- parison results between the YOLOv6s and the YOLOv6s-GRE method are summarized in Table 1. The YOLOv6s-GRE method exceeded the YOLOv6s model by 2.3 percentage points of mAP50 for the detection accuracy with a comparable detection speed, however the model size increased, and the computational complexity decreased compared to the YOLOv6s method.

Results of the method comparison for each defect type are presented in Fig. 14, showing that the YOLOv6s-GRE method achieved the best detection accuracy across all defect types. The metrics of mAP50 for exposed rebar, efflorescence and spalling improved by 2.9, 2.1 and 1.9 percentage points respectively.

4.3.2. Ablation study

The ablation study analyses the effect of each improvement within the YOLOv6s-GRE method and the results are presented in Table 2.

Introducing the GFPN neck module into the original YOLOv6s model improved the detection accuracy, with mAP50 increasing by 2.8 percentage points. Nevertheless, this operation also resulted in a detection

speed decrease of 6.9% of FPS and a considerable increase in both the model size and computational complexity by 8.8 Mega (M) and 12.51 Giga (G) respectively. The implementation of the proposed RepELAN block maintained the same level of detection accuracy and speed as the preceding method while reducing the model size and computational complexity by 23.1% and 18.4%. We finally evaluated the effect of the proposed efficient detection head, which has an adverse effect on the detection accuracy with a slight reduction of mAP50 by 0.5 percentage point. Conversely, the proposed efficient detection head significantly increased the detection speed by 28% of FPS while reducing the model size by 1.55 M and the computational resource requirement by 2.83 G.

By implementing the above improvements, the YOLOv6s-GRE method was more accurate compared to the YOLOv6s method, has a comparable level of detection speed, while reducing the model size and computational complexity to make it more appropriate for deployment on a UAV onboard computer.

4.3.3. Quantization evaluation

This section aims to evaluate the performance of the quantized YOLOv6s-GRE model. We firstly validated the effectiveness of the YOLOv6s-GRE method with RepOpt to address the quantization challenge of reparameterization blocks. The overall comparison results between the YOLOv6s-GRE method and the YOLOv6s-GRE method with RepOpt are listed in Table 3. In general, the YOLOv6s-GRE method with RepOpt achieved a similar performance in terms of detection accuracy, Table 1

Comparison of the results between the YOLOv6s and the YOLOv6s-GRE.

Method mAP50 FPS Params FLOPS

YOLOv6s 64.8% 290 17.19 M 44.07 G

YOLOv6s-GRE 67.1% 282 17.89 M 43.33 G

Fig. 14.mAP50 results for each defect type based on the YOLOv6s and the YOLOv6s-GRE methods.

Table 2

Effect of each improvement in the YOLOv6s-GRE method.

No. Method mAP50 FPS Params FLOPS

A YOLOv6s 64.8% 290 17.19 M 44.07 G

B A +GFPN 67.6% 270 25.27 M 56.58 G

C B +RepELAN 67.6% 272 19.44 M 46.16 G

D C +Efficient head 67.1% 282 17.89 M 43.33 G

Table 3

Overall comparison results between the YOLOv6s-GRE method and the YOLOv6s-GRE method with RepOpt.

Method mAP50 FPS Params FLOPS

YOLOv6s-GRE 67.1% 282 17.89 M 43.33 G

YOLOv6s-GRE with RepOpt 66.9% 285 17.89 M 43.48 G

(12)

and it comes with a similar detection speed, model size and computational complexity compare to the YOLOv6s-GRE method. In addition to the overall comparison, we also performed a comparison across each defect type, as shown in Fig. 15. The YOLOv6s-GRE method with RepOpt had a similar detection performance for each defect type compared to the YOLOv6s-GRE method. Thus, the YOLOv6s-GRE method with RepOpt can be recognized as an equivalent to the YOLOv6s-GRE method according to these comparison experiments for overall performance and the performance for each defect type.

The quantization sensitivity analysis was performed on all layers in the YOLOv6s-GRE method with RepOpt. The differences of mAP50 with

and without quantization for each layer are shown in Fig. 16. As can be seen from Fig. 16, the performance of the detect.cls_preds.1 layer dropped the most among all layers, with a 1.58 percentage point reduction in mAP50 after the quantization operation. According to the sensitivity distribution, the top 10 sensitive layers were set to full precision during partial quantization-aware training, where the layers were with differences larger than 1 percentage point, considering the trade- off among the detection accuracy, speed and energy efficiency.

Finally, the YOLOv6s-GRE method with RepOpt was quantized, and the performance and energy consumption of the quantized model were analyzed.

Fig. 15.mAP50 results for each defect type between the YOLOv6s-GRE method and the YOLOv6s-GRE method with RepOpt.

Fig. 16.Difference for the metric of mAP50 of each layer with and without quantization.

X. Yang et al.

(13)

The overall comparison results including the metrics of mAP50, FPS and energy consumption across different quantization methods and distinct precision arithmetic are presented in Table 4. Method No. 1 (the original YOLOv6s) consumed 126 J energy during the inference on the testing dataset. Method No. 2 (YOLOv6s-GRE) increased the energy consumption by 9.6% despite the improvement on the detection accuracy, compared to the Method No. 1. As can be seen from Method No. 3 and Method No. 4, the damage detection algorithm trained with smaller floating-point operation precision can significantly enhanced the detection speed and energy efficiency by 33.0% and 28.4% respectively, while maintaining a similar detection accuracy. Method No. 5, Method No. 6 and Method No. 7 compared different quantization methods such as PTQ, QAT and partial QAT implemented on the method No. 3. Firstly, Method No. 5 achieved 65.1% of mAP50 degrading the 1.8 percentage points while considerably expediting the detection speed by 96.8% and saving energy by 85.8%, in contrast to method No. 3. Next, Method No.

6 achieved a better trade-off than Method No. 5, increasing the detection accuracy but slightly lowering the detection speed and energy efficiency. Finally, Method No. 7 quantized by the partial QAT technique was evaluated and achieved the best detection accuracy among three quantization methods. Given above, Method No. 7 was selected as the final quantization method to reduce the loss of detection accuracy as much as possible after quantization. Overall, Method No. 7 reached a 66.7% detection accuracy, 523 FPS detection speed and 30 J energy consumption, which maintained a comparable level of the detection accuracy while increasing the detection speed by 83.5% and saving energy by 79.7%, in contrast to method No. 3.

5. Conclusions

Automatically detecting concrete bridge damage in real-time using UAVs and onboard computers remains a worldwide challenge. This paper firstly presented an improved real-time anchor free damage detection method called YOLOv6s-GRE on top of the YOLOv6s to better generalize to real-world scenarios and trade off between the damage detection accuracy and speed. Specifically, three improvements have been made: 1) the GFPN neck module was introduced to improve the damage detection accuracy; 2) the RepELAN block was designed and added to reduce the model size and computational complexity; and 3) the efficient detection head was presented to expedite the detection speed and further decrease the model size and complexity. In general, the YOLOv6s-GRE method, in contrast to YOLOv6s, achieved an improvement of 2.3 percentage point in mAP50, while maintaining a comparable detection speed and without requiring an obvious increase in model size or computational complexity. Subsequently, the YOLOv6s- GRE method was reconstructed with RepOptimizer to equivalently transform the YOLOv6s-GRE method to a quantization-friendly model.

The YOLOv6s-GRE method with RepOpt was then quantized using partial quantization-aware training technique. The quantized model significantly expedited the detection speed by 83.5% of FPS and saved

the energy consumption by 79.7% while keeping a comparable level of detection accuracy.

Some limitations remain despite the great success of this research.

Firstly, existing damage detection methods are still manually designed with a fixed network architecture, which cannot adapt well to specific hardware with different constrains such as UAVs. Secondly, model compression techniques are underused in the damage detection field and need to be further investigated in future studies. Thirdly, no previous study integrates the real-time damage detection methods with the UAVs’

dynamic path planning algorithm to perform flight control in real-time, leading to a higher quality inspection data acquisition with finer damage details.

To address these limitations, future research might focus on 1) developing a hardware-specific damage detection method using the Neural Architecture Search [9] technique to automatically design the network architecture according to the hardware capacity; 2) leveraging multiple model compression techniques in combination to further reduce model size and computational complexity as well as energy consumption; 3) incorporating real-time damage detection method into automated path planning algorithms to capture higher quality inspection data and boost inspection efficiency.

CRediT authorship contribution statement

Xiaofei Yang: Writing – original draft, Software, Resources, Meth- odology, Investigation, Formal analysis, Data curation, Conceptualiza- tion. Enrique del Rey Castillo: Writing – review & editing, Supervision, Project administration, Conceptualization. Yang Zou: Writing – review

& editing, Supervision, Funding acquisition, Data curation, Conceptu-

alization. Liam Wotherspoon: Writing – review & editing, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability

Data will be made available on request.

Acknowledgements

The authors would like to acknowledge the support by University of Auckland FRDF Grant (Project No. 3716476).

References

[1] Python 1.9.0 Version. https://pytorch.org/docs/1.9.0/.

[2] Quantization – PyTorch 1.13 Documentation. https://pytorch.org/docs/stable/qua ntization.html.

[3] H. Agency, Inspection Manual for Highway Structures: Vol. 1: Reference Manual, The Stationery Office, 2007. ISBN: 9780115527975.

[4] M.S. Alam, B. Natesha, T. Ashwin, R.M.R. Guddeti, UAV based cost-effective real- time abnormal event detection using edge computing, Multimed. Tools Appl. 78 (2019) 35119–35134, https://doi.org/10.1007/s11042-019-08067-1.

[5] R. Ali, D. Kang, G. Suh, Y.-J. Cha, Real-time multiple damage mapping using autonomous UAV and deep faster region-based neural networks for GPS-denied structures, Autom. Constr. 130 (2021), 103831, https://doi.org/10.1016/j.

autcon.2021.103831.

[6] ASCE, 2021 Report Card for America’s Infrastructure American Society of Civil Engineers. https://infrastructurereportcard.org/, 2021.

[7] A. Bochkovskiy, C.-Y. Wang, H.-Y.M. Liao, Yolov4: optimal speed and accuracy of object detection, arXiv prepr. (2020), https://doi.org/10.48550/arXiv.2004.10934 arXiv:2004.10934.

[8] Y.J. Cha, W. Choi, O. Büyük¨oztürk, Deep learning-based crack damage detection using convolutional neural networks, Comp. Aid. Civ. Infrastr. Eng. 32 (5) (2017) 361–378, https://doi.org/10.1111/mice.12263.

[9] Y. Chen, T. Yang, X. Zhang, G. Meng, X. Xiao, J. Sun, Detnas: backbone search for object detection, Adv. Neural Inf. Proces. Syst. 32 (2019), https://doi.org/

10.48550/arXiv.1903.10979.

Table 4

Overall comparison across different quantization methods and precision arithmetic in terms of the metrics of mAP50, FPS and energy consumption.

No. Method Precision mAP50 FPS Energy

consumption

1 YOLOv6s FP 32 64.8% 290 136 J

2 YOLOv6s-GRE FP 32 67.1% 282 149 J

3 YOLOv6s-GRE with

RepOpt FP 32 66.9% 285 148 J

4 YOLOv6s-GRE with

RepOpt FP 16 66.8% 379 106 J

5 3 +post training

quantization (PTQ) INT 8 65.1% 561 21 J 6 3 +quantization-aware

training (QAT) INT 8 66.2% 535 23 J

7 3 +Partial QAT INT 8 66.7% 523 30 J

(14)

Automation in Construction 159 (2024) 105254 [10] X. Chu, L. Li, B. Zhang, Make RepVGG greater again: a quantization-aware

approach, arXiv prepr. (2022), https://doi.org/10.48550/arXiv.2212.01593 arXiv:

2212.01593.

[11] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, Imagenet: a large-scale hierarchical image database, 2009, IEEE conference on computer vision and pattern recognition, IEEE (2009) 248–255, https://doi.org/10.1109/

CVPR.2009.5206848.

[12] X. Ding, H. Chen, X. Zhang, K. Huang, J. Han, G. Ding, Re-parameterizing your optimizers rather than architectures, arXiv prepr. (2022), https://doi.org/

10.48550/arXiv.2205.15242 arXiv:2205.15242.

[13] X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, J. Sun, Repvgg: making vgg-style convnets great again, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 13733–13742, https://doi.org/

10.48550/arXiv.2101.03697.

[14] A. Farhadi, J. Redmon, Yolov3: An Incremental Improvement, Computer Vision and Pattern Recognition, Springer Berlin/Heidelberg, Germany, 2018, pp. 1804–2767, https://doi.org/10.48550/arXiv.1804.02767.

[15] K. Han, Y. Wang, Q. Tian, J. Guo, C. Xu, C. Xu, Ghostnet: more features from cheap operations, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 1580–1589, https://doi.org/10.48550/

arXiv.1911.11907.

[16] S. Han, H. Mao, W.J. Dally, Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding, arXiv prepr. (2015), https://doi.org/10.48550/arXiv.1510.00149 arXiv:1510.00149.

[17] Z. He, S. Jiang, J. Zhang, G. Wu, Automatic damage detection using anchor-free method and unmanned surface vessel, Autom. Constr. 133 (2022), 104017, https://doi.org/10.1016/j.autcon.2021.104017.

[18] M. Horowitz, 1.1 computing’s energy problem (and what we can do about it), in:

2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), IEEE, 2014, pp. 10–14, https://doi.org/10.1109/ISSCC.2014.6757323.

[19] Q. Hou, D. Zhou, J. Feng, Coordinate attention for efficient mobile network design, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 13713–13722, https://doi.org/10.48550/

arXiv.2103.02907.

[20] B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, D. Kalenichenko, Quantization and training of neural networks for efficient integer-arithmetic-only inference, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2704–2713, https://doi.org/

10.48550/arXiv.1712.05877.

[21] S. Jiang, Y. Cheng, J. Zhang, Vision-guided unmanned aerial system for rapid multiple-type damage detection and localization, Struct. Health Monit. (2022), https://doi.org/10.1177/14759217221084878, 14759217221084878.

[22] Y. Jiang, D. Pang, C. Li, A deep learning approach for fast detection and classification of concrete damage, Autom. Constr. 128 (2021), 103785, https://doi.

org/10.1016/j.autcon.2021.103785.

[23] Y. Jiang, Z. Tan, J. Wang, X. Sun, M. Lin, H. Li, GiraffeDet: a heavy-neck paradigm for object detection, arXiv prepr. (2022), https://doi.org/10.48550/

arXiv.2202.04256 arXiv:2202.04256.

[24] N. Ketkar, N. Ketkar, Stochastic Gradient Descent, Deep Learning with Python: A Hands-on Introduction, 2017, pp. 113–132, https://doi.org/10.1007/978-1-4842- 2766-4_8.

[25] B. Koonce, MobileNetV3, Convolutional Neural Networks with Swift for Tensorflow, Springer, 2021, pp. 125–144, https://doi.org/10.1007/978-1-4842- 6168-2.

[26] P. Kumar, S. Batchu, S.R. Kota, Real-time concrete damage detection using deep learning for high rise structures, IEEE Access. 9 (2021) 112312–112331, https://

doi.org/10.1109/ACCESS.2021.3102647.

[27] C. Li, L. Li, H. Jiang, K. Weng, Y. Geng, L. Li, Z. Ke, Q. Li, M. Cheng, W. Nie, YOLOv6: a single-stage object detection framework for industrial applications, arXiv prepr. (2022), https://doi.org/10.48550/arXiv.2209.02976 arXiv:

2209.02976.

[28] T. Lin, LabelImg, Online, https://github.com/tzutalin/labelImg, 2015.

[29] S. Liu, L. Qi, H. Qin, J. Shi, J. Jia, Path aggregation network for instance segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8759–8768, https://doi.org/10.48550/

arXiv.1803.01534.

[30] Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, C. Zhang, Learning efficient convolutional networks through network slimming, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2736–2744, https://doi.org/10.48550/

arXiv.1708.06519.

[31] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer:

hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/

CVF International Conference on Computer Vision, 2021, pp. 10012–10022, https://doi.org/10.48550/arXiv.2103.14030.

[32] M. Maboudi, M. Homaei, S. Song, S. Malihi, M. Saadatseresht, M. Gerke, et al., arXiv prepr. (2022), https://doi.org/10.48550/arXiv.2205.03716 arXiv:

2205.03716.

[33] R. Meyes, M. Lu, C.W. de Puiseau, T. Meisen, Ablation studies in artificial neural networks, arXiv prepr. (2019), https://doi.org/10.48550/arXiv.1901.08644 arXiv:

1901.08644.

[34] L.D. Otero, N. Gagliardo, D. Dalli, W.-H. Huang, P. Cosentino, Proof of Concept for Using Unmanned Aerial Vehicles for High Mast Pole and Bridge Inspections, Florida Institute of Technology, Department of Engineering Systems, Melbourne, FL United States, 2015. https://rosap.ntl.bts.gov/view/dot/29176.

[35] X. Ruiqiang, YOLOv5s-GTB: light-weighted and improved YOLOv5s for bridge crack detection, arXiv prepr. (2022), https://doi.org/10.48550/arXiv.2206.01498 arXiv:2206.01498.

[36] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen, Mobilenetv2: inverted residuals and linear bottlenecks, Proc. IEEE Conf. Comput. Vis. Pattern Recognit.

(2018) 4510–4520, https://doi.org/10.48550/arXiv.1801.04381.

[37] S. Sun, W. Liu, R. Cui, YOLO based bridge surface defect detection using decoupled prediction, in: 2022 7th Asia-Pacific Conference on Intelligent Robot Systems (ACIRS), IEEE, 2022, pp. 117–122, https://doi.org/10.1109/

ACIRS55390.2022.9845546.

[38] A.A. Süzen, B. Duman, B. S¸en, Benchmark analysis of jetson tx2, jetson nano and raspberry pi using deep-cnn, in: 2020 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), IEEE, 2020, pp. 1–5, https://doi.org/10.1109/HORA49412.2020.9152915.

[39] M. Tan, R. Pang, Q.V. Le, Efficientdet: scalable and efficient object detection, in:

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10781–10790, https://doi.org/10.48550/

arXiv.1911.09070.

[40] N.G. Thompson, M. Yunovich, D. Dunmire, Cost of corrosion and corrosion maintenance strategies, Corros. Rev. 25 (3–4) (2007) 247–262, https://doi.org/

10.1515/CORRREV.2007.25.3-4.247.

[41] Ultralytics, YOLOv5. https://github.com/ultralytics/yolov5/tree/v6.1.

[42] C.-Y. Wang, H.-Y.M. Liao, I.-H. Yeh, Designing network design strategies through gradient path analysis, arXiv prepr. (2022), https://doi.org/10.48550/

arXiv.2211.04800 arXiv:2211.04800.

[43] F. Wang, Y. Zou, E. del Rey Castillo, J. Lim, Optimal UAV Image Overlap for Photogrammetric 3D Reconstruction of Bridges Vol. 1101, IOP Publishing, 2022, https://doi.org/10.1088/1755-1315/1101/2/022052, p. 022052.

[44] F. Wang, Y. Zou, C. Zhang, J. Buzzatto, M. Liarokapis, E. del Rey Castillo, J.B. Lim, UAV navigation in large-scale GPS-denied bridge environments using fiducial marker-corrected stereo visual-inertial localisation, Autom. Constr. 156 (2023), 105139, https://doi.org/10.1016/j.autcon.2023.105139.

[45] J. Wang, Z. Feng, Z. Chen, S. George, M. Bala, P. Pillai, S.-W. Yang,

M. Satyanarayanan, Bandwidth-efficient live video analytics for drones via edge computing, in: 2018 IEEE/ACM Symposium on Edge Computing (SEC), IEEE, 2018, pp. 159–173, https://doi.org/10.1109/SEC.2018.00019.

[46] J. Wells, B. Lovelace, Unmanned Aircraft System Bridge Inspection Demonstration Project Phase II Final Report, Dept. of Transportation. Research Services & Library, Minnesota, 2017. https://rosap.ntl.bts.gov/view/dot/32636.

[47] X. Xu, Y. Jiang, W. Chen, Y. Huang, Y. Zhang, X. Sun, DAMO-YOLO: a report on real-time object detection design, arXiv prepr. (2022), https://doi.org/10.48550/

arXiv.2211.15444 arXiv:2211.15444.

[48] C. Zhang, C.c. Chang, M. Jamshidi, Concrete bridge surface damage detection using a single-stage detector, Comp. Aid. Civ. Infrastr. Eng. 35 (4) (2020) 389–409, https://doi.org/10.1111/mice.12500.

[49] C. Zhang, Y. Zou, F. Wang, E. del Rey Castillo, J. Dimyadi, L. Chen, Towards fully automated unmanned aerial vehicle-enabled bridge inspection: where are we at?

Constr. Build. Mater. 347 (2022), 128543 https://doi.org/10.1016/j.

conbuildmat.2022.128543.

[50] H. Zhang, M. Cisse, Y.N. Dauphin, D. Lopez-Paz, mixup: beyond empirical risk minimization, arXiv prepr. (2017), https://doi.org/10.48550/arXiv.1710.09412 arXiv:1710.09412.

[51] S. Zhao, F. Kang, J. Li, Concrete dam damage detection and localisation based on YOLOv5s-HSC and photogrammetric 3D reconstruction, Autom. Constr. 143 (2022), 104555, https://doi.org/10.1016/j.autcon.2022.104555.

[52] X. Zhou, D. Wang, P. Kr¨ahenbühl, Objects as points, arXiv prepr. (2019), https://

doi.org/10.48550/arXiv.1904.07850 arXiv:1904.07850.

[53] Y. Zhou, S.-M. Moosavi-Dezfooli, N.-M. Cheung, P. Frossard, Adaptive quantization for deep neural network, in: Proceedings of the AAAI Conference on Artificial Intelligence Vol. 32, 2018, https://doi.org/10.48550/arXiv.1712.01048.

[54] D. Zou, M. Zhang, Z. Bai, T. Liu, A. Zhou, X. Wang, W. Cui, S. Zhang, Multicategory damage detection and safety assessment of post-earthquake reinforced concrete structures using deep learning, Comp. Aid. Civ. Infrastr. Eng. (2022), https://doi.

org/10.1111/mice.12815.

X. Yang et al.