YOLO-PowerLite A Lightweight YOLO Model For Transmission Line Abnormal Target Detection
YOLO-PowerLite A Lightweight YOLO Model For Transmission Line Abnormal Target Detection
ABSTRACT The secure and stable operation of power transmission lines is essential for electrical
systems. Given that abnormal targets such as bird’s nests and defective insulators may lead to transmis-
sion failures, timely detection of these targets is imperative. This paper introduces the YOLO-PowerLite
model, an advanced lightweight object detection model based on YOLOv8n, designed for efficient, real-
time detection on resource-constrained unmanned aerial vehicles (UAVs) equipped with edge computing
platforms. In the feature fusion module, YOLO-PowerLite incorporates the innovative C2f_AK module,
significantly reducing the number of parameters and enhancing the adaptability and fusion capability
of features at different scales. Meanwhile, the adoption of the Bidirectional Feature Pyramid Network
(BiFPN) further optimizes the efficiency and effectiveness of feature processing. In addition, the newly
designed lightweight detection head significantly reduces the number of parameters and computational
requirements. The integration of the Coordinate Attention mechanism in the backbone network enhances
the model’s ability to focus on and recognize abnormal targets in complex backgrounds. Experimental
results show that YOLO-PowerLite achieves a [email protected] of 94.2%, maintaining the accuracy of the original
YOLOv8n while significantly reducing parameters, FLOPs, and model size by 42.3%, 30.9%, and 40.4%,
respectively. Comparative analysis shows that YOLO-PowerLite surpasses other mainstream lightweight
models in detection accuracy and computational efficiency. Deployment on the NVIDIA Jetson Xavier NX
platform demonstrates an average processing time of 31.2 milliseconds per frame, highlighting its potential
for real-time applications in monitoring transmission lines.
2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
105004 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 12, 2024
C. Liu et al.: YOLO-PowerLite: A Lightweight YOLO Model
monitoring offers a degree of automation, the installation machine learning algorithms such as SVM (Support Vector
and maintenance costs are substantial, and its stability and Machine) [8] are utilized for classification. Although these
accuracy are often compromised in complex environments. traditional methods perform well in simpler scenarios, they
With technological advancements, especially in drone and often falter in complex environments and diverse detection
computer vision technologies, more efficient and accurate tasks. This is because manually designed features struggle to
methods for detecting abnormalities in transmission lines encompass all variations in objects, and these methods lack
have emerged [3]. Utilizing high-resolution cameras mounted profound understanding of the deeper features of images [9].
on drones, in conjunction with sophisticated computer vision With the rise of deep learning technologies, methods of
algorithms, this approach enables automated detection and object detection based on deep neural networks have started
rapid identification of abnormal targets along transmis- to garner widespread attention among researchers. Compared
sion lines. to traditional approaches, deep learning can autonomously
Although current image analysis techniques can enhance learn complex feature representations from vast datasets,
detection accuracy, they often fall short in terms of real- thereby significantly enhancing the accuracy and robustness
time processing, struggling to meet the demands for rapid of object detection [10]. Within deep learning techniques,
reaction in urgent situations [4]. Real-time detection of object detection algorithms are generally categorized into
abnormal targets is critical for preventing accidents and two primary types: two-stage and one-stage algorithms. Two-
reducing losses; it necessitates a detection system that stage algorithms, such as R-CNN [11] and its variants Fast
can instantly recognize and address abnormalities, enabling R-CNN [12] and Faster R-CNN [13], initially generate a
timely action. Furthermore, with the widespread adoption series of candidate regions and subsequently perform feature
of drones and edge computing devices in such appli- extraction and classification for each region. While these
cations, there are heightened demands for computational methods excel in precision, their significant computational
efficiency and reduced resource consumption in detection cost limits their application in real-time scenarios.
models. Model lightweighting becomes essential for achiev- In contrast to two-stage algorithms, one-stage algorithms
ing efficient operation in resource-constrained environments, such as YOLO (You Only Look Once) and SSD (Single
particularly for applications that are deployed on mobile Shot MultiBox Detector) [14] directly predict the class and
devices such as drones. location of objects in the image, eliminating the need for
In light of this, this paper proposes the ‘‘YOLO- generating candidate regions, thus significantly accelerating
PowerLite’’ model, which is designed to address the dual the detection speed. As the quintessential representative of
challenges of real-time processing and lightweight design one-stage algorithms, the YOLO algorithm has undergone
for the detection of abnormal targets on transmission lines. numerous iterations and enhancements since its inception,
By strategically optimizing the latest YOLOv8 model, each version striving to achieve a superior equilibrium among
including improvements to the feature fusion module, the detection speed, accuracy, and model complexity. YOLO
lightweight design of the detection head, and the integration partitions the input image into a grid of cells, with each
of a lightweight attention mechanism, we aim to reduce cell responsible for predicting objects whose center points
the model’s computational complexity and storage demands fall within it [15]. For each grid cell, YOLO predicts mul-
while ensuring its accuracy and real-time detection capabil- tiple bounding boxes along with their associated confidence
ities. The development of this lightweight model not only scores and class probabilities. The confidence score reflects
enables efficient real-time detection of abnormalities on the likelihood that a bounding box contains an object and
transmission lines by drones and other mobile devices under the accuracy of the prediction, while the class probabilities
resource-limited conditions but also offers robust technical indicate the likelihood of the object belonging to various cat-
support for the safe operation of power systems, significantly egories. Through this mechanism, YOLO provides a global
enhancing the reliability and economic efficiency of the elec- view while achieving rapid and precise object detection.
trical infrastructure. Since the initial release of YOLO, subsequent ver-
sions including YOLOv2 (also known as YOLO9000) [16],
II. RELATED WORK YOLOv3 [17], YOLOv4 [18], YOLOv5, YOLOv6 [19],
As the field of computer vision evolves, object detection has YOLOv7 [20], and the recent YOLOv8, have significantly
emerged as a pivotal area of research, aiming to identify improved and optimized the original framework. These iter-
and locate specific objects within images [5]. Early meth- ations not only include adjustments and enhancements to
ods for detecting abnormalities in power transmission lines the network architecture but also introduce new mechanisms,
primarily relied on manually designed features and tradi- such as multi-scale prediction, enhanced feature extraction
tional machine learning algorithms. For instance, SIFT (Scale networks, and more precise methods for bounding box pre-
Invariant Feature Transform) [6] and HOG (Histogram of diction, all aimed at enhancing the model’s performance and
Oriented Gradient) [7], two classic feature descriptors, have applicability.
been extensively utilized for object recognition and localiza- However, regardless of being two-stage or one-stage deep
tion. They are first utilized to describe the object by extracting learning models, they typically require substantial computa-
key points and edge information from the image, and then tional resources. This requirement renders them a bottleneck
VOLUME 12, 2024 105005
C. Liu et al.: YOLO-PowerLite: A Lightweight YOLO Model
for applications in resource-constrained environments such model to reduce deployment costs, this study implemented
as drones equipped with edge computing platforms [21]. a series of targeted improvements based on the YOLOv8n
In the field of abnormality detection for power transmission model. We refer to the final improved network model as
lines, the development of lightweight models is crucial. Tra- YOLO-PowerLite, and the network structure of the improved
ditional detection methods, such as manual inspection and model is shown in Fig. 1.
ground-based sensor monitoring, suffer from inefficiencies
and high costs. Lightweight deep learning models can be
deployed on mobile devices such as drones for efficient and
cost-effective automated inspections.
Li et al. [22] achieved a substantial reduction in network
parameters by using the more lightweight Mobilenetv2 net-
work architecture in place of the original Darknet-53 structure
used in YOLOv3, and by substituting standard 3×3 convolu-
tions with depthwise separable convolutions in the detection
head. Huang et al. [23] proposed an improved lightweight
version of a power transmission line insulator defect detection
algorithm based on YOLOv5. The primary strategy employed
involves removing redundant convolution layers and reducing
the number of channels; additionally, adaptive attention mod-
ules are integrated between adjacent residual blocks to assign
greater weight to key features, thereby enhancing the model’s
learning capabilities. Chen et al. [24], in an effort to alleviate
the computational demands of the network, integrated the
lightweight GSConv module into both the backbone and
neck networks of YOLOv8. Additionally, they introduced
the Content-Aware ReAssembly of Features (CARAFE)
architecture into the neck structure, aimed at enhancing
the efficiency of information utilization during the feature
upsampling process and improving the model’s capability for
feature integration. Meanwhile, Zhang et al. [25] increased FIGURE 1. YOLO-PowerLite network structure.
the accuracy of abnormality detection in power transmission
lines by incorporating the Multi-Scale Large Kernel (MSLK) First, in the feature fusion module, we implemented
attention mechanism and utilizing the enhanced SIoU loss significant enhancements to the C2f module in YOLOv8.
function. In addition, their GSC_C2f module, designed based By replacing the standard Bottleneck structure within the C2f
on lightweight GSConv module, successfully simplified the module with the AK_Bottleneck based on Alterable Kernel
computational process of the model and significantly reduced Convolution (AKConv), we created the new C2f_AK mod-
the memory usage. ule. This enhancement significantly reduced the number of
Despite progress in the field of abnormality detection for parameters and computational complexity, not only providing
power transmission lines, current models still face challenges the model with advantages in terms of lightness and effi-
in balancing detection accuracy with model lightweighting. ciency but also facilitating the deployment of the model on
Since transmission line images often contain complex back- unmanned aerial vehicle (UAV) edge computing devices for
grounds such as trees, grass, and other power facilities, these real-time detection of abnormal targets on transmission lines.
factors increase the difficulty of accurate detection based on In addition, we reconstructed the feature fusion part based
aerial images. Moreover, efficiently implementing abnormal on the Bidirectional Feature Pyramid Network (BiFPN) to
target detection becomes challenging under the constraints improve the efficiency and effectiveness of feature processing
of limited resources on edge computing device platforms. for transmission line abnormal targets.
Existing research lacks specificity and is difficult to apply Second, for the design of the detection head, we optimized
directly to the detection of abnormalities in power trans- the decoupled head structure used in the original YOLOv8
mission lines. Consequently, further research is necessary model. Although the decoupled head structure enhances
to develop methodologies that enable effective identification model performance by providing dedicated network path-
and localization of potential hazards in these computationally ways for category prediction and bounding box regression,
constrained environments. the resulting increase in parameter count and computational
burden may become a bottleneck in resource-constrained
III. METHODS environments. Therefore, we proposed to further reduce the
In order to ensure the accuracy of abnormal target detection in total parameter count and computational demands of the
transmission lines and simultaneously achieve a lightweight model by sharing parameters at the forefront of category
prediction and bounding box regression tasks, while requirements of the task. The AKConv structure is shown
maintaining efficient feature representation by preserving in Fig. 3. AKConv utilizes a new coordinate generation
task-specific terminal layers. algorithm to define initial positions for convolution kernels
Finally, considering that transmission line images captured of arbitrary size. Moreover, to accommodate target changes,
by UAVs often feature complex backgrounds, such as trees, AKConv introduces offsets that adjust the shape of samples at
grass, and other power facilities. All of these complex back- each position. The introduction of this approach enables the
grounds can greatly affect the accurate detection of abnormal C2f_AK module to significantly enhance its performance in
targets on transmission lines, so we introduced the Coordinate lightweighting while maintaining the strong detection capa-
Attention (CA) mechanism into the backbone network of the bility of the YOLOv8 model.
model. The CA mechanism effectively improved the model’s
attention to key features and reduced sensitivity to back-
ground interference by emphasizing the channel and spatial
information of the image.
2) BIFPN
In the feature fusion module of YOLOv8, the Path Aggre-
gation Network (PANet) [27] is utilized, exhibiting notable
advantages over the traditional Feature Pyramid Network
FIGURE 2. Structure diagram of C2f_AK in YOLO-PowerLite. (a) C2f_AK
(FPN) [28]. As shown in Fig. 4b, PANet effectively enhances
module. (b) AK_Bottleneck module. the fusion process between features at different scales through
its innovative bidirectional information flow mechanism.
The AK_Bottleneck structure employs an innovative This mechanism not only facilitates the transfer of low-level
lightweight network technology known as the Alterable Ker- detail information to higher levels but also ensures that the
nel Convolution (AKConv) module [26]. This module aims semantic information at higher levels augments the rep-
to enhance network performance and efficiency through resentation of low-level features. This leads to superior
dynamic adjustments in the shape and size of the convolu- performance of the model in processing images with rich
tional kernels. The core principle of the AKConv module details and complex backgrounds. Although PANet has made
lies in its unique ability to allow the convolutional kernels significant progress in multi-scale feature fusion, there is
to dynamically change their shape and parameter configu- still room for improvement [29]. Initially, PANet does not
rations based on the demands of the input features. This fully manifest its potential when processing large-scale fea-
flexibility enables AKConv to not only adapt to different ture maps, possibly overlooking some crucial information,
data characteristics but also to optimize the utilization of thereby impeding the overall performance of object detection.
computational resources and achieve a linear reduction in the Additionally, the feature map loses some of its original infor-
number of convolutional parameters, tailored to the specific mation during the process of upsampling and downsampling,
and this loss of information reduces the reuse efficiency of optimization flexibility, as the layers for category prediction
the feature map. and bounding box regression cannot share information.
In light of these considerations, our proposed improve-
ment sought to address these challenges by introducing a
shared parameter structure, while retaining key advantages
of the decoupled head structure. The improved detection
head is shown in Fig. 5b. By sharing parameters in the front
part of the category prediction and bounding box regression
branches, our design not only reduced the model’s overall
parameter count but also its computational burden. Moreover,
by preserving task-specific end layers, we ensured that the
model is able to learn effective feature representations for
each task. This approach improved the model’s computa-
tional efficiency and continued to provide sufficient task
FIGURE 4. Feature network design: (a) FPN. (b) PANet. (c) BiFPN.
specificity to achieve high performance.
and coordinate attention generation. During the coordinate To systematically evaluate the performance of the YOLO-
information embedding phase, although global pooling is PowerLite model, we divided the entire dataset into training,
commonly utilized in channel attention to encode spatial validation, and test sets at a ratio of 7:2:1. This division
information globally, it compresses this information into ensured that the model is trained on a sufficiently diverse
a single channel descriptor, thus making it challenging to set of data, while providing separate validation and test sets
preserve the integrity of positional information. However, for evaluating the model’s generalization ability and perfor-
preserving positional information is crucial for capturing mance in real-world applications.
spatial structure in visual tasks. For a given input X, the Given the common issue of overfitting encountered dur-
CA mechanism utilizes two scales of pooling kernels, (H, 1) ing the training of deep learning models, particularly with
and (1, W), to encode each channel along the horizontal limited training samples, we implemented a series of data
and vertical directions, respectively, thereby enhancing the augmentation techniques to enhance the model’s robust-
capture of spatial information. Moving into the coordinate ness and generalization ability. Data augmentation operations
attention generation phase, the feature maps obtained from include horizontal flipping, vertical flipping, random rota-
the coordinate information embedding phase are first merged, tion, brightness adjustment, gaussian noise, and blurring.
followed by a convolutional transformation to reduce their These techniques expand the diversity of the training sam-
dimensionality, thereby simplifying the model’s complexity. ples and simulate various complex environmental conditions
Subsequently, after batch normalization and nonlinear acti- likely encountered in real-world applications, thereby ensur-
vation, an intermediate feature map rich in information is ing the model’s effectiveness and robustness. The effects of
produced. This feature map is then split along the spatial data augmentation are illustrated in Fig. 7, where images (a)
dimension into two separate tensors. Finally, each tensor through (h) represent magnifications of original images using
is processed through a 1 × 1 convolution and a Sigmoid data processing methods.
activation function, transforming them into tensors with the
same number of channels as input X, thereby completing the
construction of the entire CA mechanism.
In this study, we integrated the CA mechanism within
the backbone network. One significant advantage of this
approach is that the CA mechanism, being a lightweight
attention mechanism, can significantly enhance the model’s
ability to focus on critical information about abnormal tar-
gets while considering the limited computational and storage
resources of UAV hardware platforms. To maximize the uti-
lization of this attention mechanism, we opted to employ the FIGURE 7. Data augmentation process and results. (a) Original image.
(b) Horizontal flip image. (c) Vertical flip image. (d) Random rotation
CA attention mechanism at the end of the model’s backbone image. (e) High brightness image. (f) Low brightness image. (g) Gaussian
network, aiming to achieve superior performance. noise image. (h) Blur image.
3) EVALUATION METRICS
To comprehensively evaluate the performance of abnormality
detection models for transmission lines, this study focused
not only on the accuracy of the models but also consid-
ered their lightweight requirements for deployment on edge
computing devices. Therefore, precision, recall, average pre-
cision (AP), mean average precision (mAP), floating-point
operations (FLOPs), Params, and model size were selected
as key evaluation metrics.
Precision is the ratio of all predicted targets correctly iden-
tified by the model, and the calculation formula is shown
in (1). Where true positive (TP) is the number of transmission
line abnormal targets correctly identified by the model, while
false positive (FP) and false negative (FN) represent the num-
ber of abnormal targets that actually exist but are incorrectly
identified and missed by the model, respectively.
FIGURE 8. Dataset information.
TP
Precision = (1)
TP + FP
were not utilized during the training of ablation and compar-
ative experiments. The important hyperparameter settings for Recall is the ratio of all actual targets correctly recognized by
the models during the training phase are presented in Table 1. the model and is calculated as shown in (2).
TP
Recall = (2)
TABLE 1. Model hyperparameter settings. TP + FN
AP is equal to the area under the precision-recall curve,
and the closer its value is to 1 means the better the model
performance, and the formula is shown in (3).
Z 1
AP = Precision (Recall) dRecall (3)
0
YOLOv8x. Each version exhibits a progressive increase in FLOPs can reflect the computational complexity of the
parameter count and resource consumption, meeting diverse model, which is calculated as shown in (5).
performance needs for detection. Detailed specifications of
depths, widths, and maximum channel counts for these mod- FLOPs = 2 × H × W Cin K 2 + 1 Cout (5)
els are presented in Table 2.
Model lightweighting is evaluated by the number of parame-
TABLE 2. Parameters corresponding to different sizes of YOLOv8. ters (Params), which is calculated as shown in (6).
B. EXPERIMENTAL RESULTS model YOLOv8n. This effectively lowers the model’s com-
1) ABLATION EXPERIMENTS plexity and optimizes its efficiency, making it more suitable
In order to validate the effectiveness of each improvement for low-performance devices and easier to integrate. From
strategy proposed in this paper, we conducted ablation exper- Table 3, we can observe that the decrease in FPS is primar-
iments on the baseline model, and the experimental results ily due to the reconstruction of YOLOv8’s feature fusion
are displayed in Table 3. component based on the BiFPN and the introduction of the
CA attention mechanism. The inclusion of BiFPN adds fea-
TABLE 3. Detection results after the introduction of different ture processing steps, while the CA attention mechanism
improvement strategies.
increases computational complexity, both contributing to the
FPS decrease. However, despite the FPS dropping to 101,
slightly lower than the baseline model’s 149.3, it still meets
the real-time detection requirements for anomalies in trans-
mission lines [35]. These ablation experiment results clearly
demonstrate the advantages and efficacy of our model in
terms of accuracy and achieving a lightweight design, val-
idating the feasibility of our approach in the domain of
transmission line anomaly detection.
From Table 3, it is evident that by replacing the C2f mod- 2) COMPARISON EXPERIMENTS
ule in YOLOv8n with the C2f_AK module, we observed a To further validate the performance of YOLO-PowerLite,
reduction in the number of parameters (Params), floating- we compared its performance with that of several mainstream
point operations (FLOPs), and model size by 9.1%, 6.2%, and lightweight target detection models. We focused on evalu-
8.7% respectively. This enhancement significantly dimin- ating key metrics such as [email protected], Params, FLOPs, and
ishes the model’s size and computational cost, thereby Model Size for each model. The experimental results are
improving the efficiency of feature extraction. However, presented in Table 4.
the [email protected] slightly decreased to 93.0%. Specifically,
the C2f_AK module employs the AK_Bottleneck structure, TABLE 4. Results of each indicator for different models.
3) VISUALIZATION ANALYSIS
Grad-CAM [36] is a widely utilized visualization method for
highlighting the regions of an image that most significantly
contribute to the model’s predicted results, thereby enhancing
the transparency of the model’s decision-making process.
We chose images of power transmission lines with complex
backgrounds as our test cases. In these cases, the image
backgrounds are complex and variable, containing trees,
grass, and other power facilities, all of which could poten-
tially impact the performance of the target detection model.
By applying Grad-CAM to the output of the YOLOv8n and
YOLO-PowerLite models, we produced heatmaps reflecting
the focus of the model’s attention, as shown in Fig. 9.
FIGURE 9. Grad-CAM visualization results. FIGURE 10. The detection results of four lightweight algorithms in
abnormal target detection of transmossion lines.
10 to 15 watts, thus demonstrating an exceptional energy reductions in parameters, floating-point operations (FLOPs),
efficiency ratio. and model size—42.3%, 30.9%, and 40.4%, respectively—
To ensure the model runs efficiently on the Jetson Xavier with parameters reduced to 1.73 M, FLOPs to 5.6 G,
NX platform, we first converted the YOLO-PowerLite model and model size to 3.55 MB. Moreover, when compared
from its original framework format (such as PyTorch) to with other mainstream lightweight object detection mod-
the ONNX (Open Neural Network Exchange) format. Sub- els, YOLO-PowerLite not only leads in detection accuracy
sequently, we utilized NVIDIA TensorRT tools to optimize but also exhibits significant advancements in model light-
the ONNX model and convert it into the TensorRT engine ness. Notably, the deployment of YOLO-PowerLite on the
format (.engine), thereby fully leveraging the hardware NVIDIA Jetson Xavier NX edge computing platform demon-
acceleration capabilities of the Jetson Xavier NX. In this strated its exceptional real-world application performance,
study, we used the test set from the dataset as input for with an average inference time of just 31.2 milliseconds
model inference to simulate real-world scenarios where per frame. This remarkable processing speed underscores
drones equipped with edge computing devices detect abnor- YOLO-PowerLite’s significant advantage in real-time oper-
mal targets on transmission lines. The deployment of the ations and highlights its immense potential and value in
YOLO-PowerLite model on the Jetson Xavier NX platform real-time abnormality detection applications for power trans-
exhibited an average processing time of only 31.2ms per mission lines.
frame, highlighting its significant advantage in processing However, our current work continues to face several lim-
speed and demonstrating its immense potential for real-time itations that require attention. Firstly, due to the lack of
applications. publicly available datasets for multiple categories of trans-
Additionally, we provided detailed configuration informa- mission line abnormalities in the electrical field, we have
tion for the Jetson Xavier NX, including the Jetpack version integrated and supplemented existing datasets; however, chal-
and TensorRT version, as shown in Table 5. lenges related to insufficient data samples and diversity
persist. These issues could impact the model’s generalizabil-
TABLE 5. Detailed configuration information for NVIDIA jetson Xavier NX. ity and adaptability to various abnormalities. Therefore, it is
imperative that we expand the dataset, collect a broader array
of abnormal object images, and validate the model’s general-
izability and practicality across a wider range of application
scenarios.
Additionally, the successful deployment of YOLO-
PowerLite on the NVIDIA Jetson Xavier NX edge computing
platform showcases the model’s excellent real-time pro-
cessing performance. We plan to deploy the model on an
authentic UAV platform and conduct extensive field test-
V. CONCLUSION AND FUTURE WORK ing to validate its effectiveness and stability in practical
To address the challenge of conducting efficient, real- applications.
time abnormality detection of power transmission lines on Ultimately, although we have achieved significant progress
resource-constrained UAV-mounted edge computing plat- in model lightweighting, our exploration of multitask-
forms, we introduce the YOLO-PowerLite model. This ing capabilities remains somewhat limited. Looking ahead,
model, by optimizing and enhancing the YOLOv8n frame- we might enhance the model’s ability to concurrently learn
work, achieves significant success in model lightweighting multiple tasks, such as simultaneous abnormality detection
while also ensuring high accuracy and real-time object and health assessment of power transmission lines, to offer
detection. In our model improvement efforts, we imple- a more comprehensive solution for monitoring electrical
ment several innovative optimizations to the YOLOv8 infrastructure.
model, including the introduction of the C2f_AK mod-
ule and BiFPN-based feature fusion strategy within the REFERENCES
feature fusion module, as well as the design of the
[1] J. Yuan, X. Zheng, L. Peng, K. Qu, H. Luo, L. Wei, J. Jin, and
lightweight detection head and the incorporation of the CA F. Tan, ‘‘Identification method of typical defects in transmission
mechanism. lines based on YOLOv5 object detection algorithm,’’ Energy
Given the absence of publicly available datasets for mul- Rep., vol. 9, pp. 323–332, Sep. 2023, doi: 10.1016/j.egyr.2023.
04.078.
tiple categories of transmission line abnormalities, we syn-
[2] M. D. F. Ahmed, J. C. Mohanta, A. Sanyal, and P. S. Yadav, ‘‘Path planning
thesized and augmented the existing datasets to construct a of unmanned aerial systems for visual inspection of power transmission
more comprehensive and challenging consolidated dataset. lines and towers,’’ IETE J. Res., vol. 70, no. 3, pp. 3259–3279, Mar. 2024,
Ablation experiments conducted on this dataset show that doi: 10.1080/03772063.2023.2175053.
the YOLO-PowerLite model achieved a [email protected] of [3] C. Chen, Z. Zheng, T. Xu, S. Guo, S. Feng, W. Yao, and Y. Lan,
‘‘YOLO-based UAV technology: A review of the research and its
94.2%, comparable to that of YOLOv8n. In terms of applications,’’ Drones, vol. 7, no. 3, p. 190, Mar. 2023, doi: 10.3390/
model efficiency, YOLO-PowerLite demonstrated significant drones7030190.
[4] H. Li, Y. Dong, Y. Liu, and J. Ai, ‘‘Design and implementation of [24] Y. Chen, H. Liu, J. Chen, J. Hu, and E. Zheng, ‘‘Insu-YOLO:
UAVs for bird’s nest inspection on transmission lines based on deep An insulator defect detection algorithm based on multiscale feature
learning,’’ Drones, vol. 6, no. 9, p. 252, Sep. 2022, doi: 10.3390/ fusion,’’ Electronics, vol. 12, no. 15, p. 3210, Jul. 2023, doi: 10.3390/
drones6090252. electronics12153210.
[5] V. K. Sharma and R. N. Mir, ‘‘A comprehensive and system- [25] L. Zhang, B. Li, Y. Cui, Y. Lai, and J. Gao, ‘‘Research on improved
atic look up into deep learning based object detection techniques: YOLOv8 algorithm for insulator defect detection,’’ J. Real-Time
A review,’’ Comput. Sci. Rev., vol. 38, Nov. 2020, Art. no. 100301, doi: Image Process., vol. 21, no. 1, p. 22, Jan. 2024, doi: 10.1007/
10.1016/j.cosrev.2020.100301. s11554-023-01401-9.
[6] D. G. Lowe, ‘‘Object recognition from local scale-invariant features,’’ in [26] X. Zhang, Y. Song, T. Song, D. Yang, Y. Ye, J. Zhou, and L. Zhang,
Proc. 7th IEEE Int. Conf. Comput. Vis., Sep. 1999, pp. 1150–1157, doi: ‘‘LDConv: Linear deformable convolution for improving convolutional
10.1109/ICCV.1999.790410. neural networks,’’ 2023, arXiv:2311.11587.
[7] N. Dalal and B. Triggs, ‘‘Histograms of oriented gradients for human detec- [27] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, ‘‘Path aggregation network
tion,’’ in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., for instance segmentation,’’ in Proc. IEEE/CVF Conf. Comput. Vis.
Jun. 2005, pp. 886–893, doi: 10.1109/CVPR.2005.177. Pattern Recognit., Jun. 2018, pp. 8759–8768, doi: 10.1109/CVPR.2018.
[8] A. Mammone, M. Turchi, and N. Cristianini, ‘‘Support vector machines,’’ 00913.
WIREs Comput. Statist., vol. 1, no. 3, pp. 283–289, Nov. 2009, doi: [28] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie,
10.1002/wics.49. ‘‘Feature pyramid networks for object detection,’’ in Proc. IEEE Conf.
[9] X. Wu, D. Sahoo, and S. C. H. Hoi, ‘‘Recent advances in deep learning for Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 936–944, doi:
object detection,’’ Neurocomputing, vol. 396, pp. 39–64, Jul. 2020, doi: 10.1109/CVPR.2017.106.
10.1016/j.neucom.2020.01.085. [29] X. Wang, H. Gao, Z. Jia, and Z. Li, ‘‘BL-YOLOv8: An improved road
defect detection model based on YOLOv8,’’ Sensors, vol. 23, no. 20,
[10] Z. Li, Y. Wang, N. Zhang, Y. Zhang, Z. Zhao, D. Xu, G. Ben, and Y. Gao,
p. 8361, Oct. 2023, doi: 10.3390/s23208361.
‘‘Deep learning-based object detection techniques for remote sensing
images: A survey,’’ Remote Sens., vol. 14, no. 10, p. 2385, May 2022, doi: [30] M. Tan, R. Pang, and Q. V. Le, ‘‘EfficientDet: Scalable and
10.3390/rs14102385. efficient object detection,’’ in Proc. IEEE/CVF Conf. Comput.
Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 10778–10787, doi:
[11] R. Girshick, J. Donahue, T. Darrell, and J. Malik, ‘‘Rich feature hierar-
10.1109/CVPR42600.2020.01079.
chies for accurate object detection and semantic segmentation,’’ in Proc.
[31] Q. Hou, D. Zhou, and J. Feng, ‘‘Coordinate attention for effi-
IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 580–587, doi:
cient mobile network design,’’ in Proc. IEEE/CVF Conf. Comput.
10.1109/CVPR.2014.81.
Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 13708–13717, doi:
[12] R. B. Girshick. (Apr. 2015). Fast R-CNN. Accessed: Jan. 21, 2024.
10.1109/CVPR46437.2021.01350.
[Online]. Available: https://www.semanticscholar.org/paper/Fast-R-CNN-
[32] J. Li, D. Yan, K. Luan, Z. Li, and H. Liang, ‘‘Deep learning-based
Girshick/7ffdbc358b63378f07311e883dddacc9faeeaf4b
bird’s nest detection on transmission lines using UAV imagery,’’
[13] S. Ren, K. He, R. Girshick, and J. Sun, ‘‘Faster R-CNN: Towards Appl. Sci., vol. 10, no. 18, p. 6147, Sep. 2020, doi: 10.3390/
real-time object detection with region proposal networks,’’ IEEE Trans. app10186147.
Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017, doi:
[33] X. Tao, D. Zhang, Z. Wang, X. Liu, H. Zhang, and D. Xu, ‘‘Detection of
10.1109/TPAMI.2016.2577031.
power line insulator defects using aerial images analyzed with convolu-
[14] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, tional neural networks,’’ IEEE Trans. Syst., Man, Cybern., Syst., vol. 50,
and A. C. Berg, ‘‘SSD: Single shot MultiBox detector,’’ in Computer no. 4, pp. 1486–1498, Apr. 2020, doi: 10.1109/TSMC.2018.2871750.
Vision—ECCV 2016 (Lecture Notes in Computer Science), vol. 9905, [34] H. Jiang, F. Hu, X. Fu, C. Chen, C. Wang, L. Tian, and Y. Shi,
B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds., Cham, Switzerland: ‘‘YOLOv8-peas: A lightweight drought tolerance method for peas based
Springer, 2016, pp. 21–37, doi: 10.1007/978-3-319-46448-0_2. on seed germination vigor,’’ Frontiers Plant Sci., vol. 14, Sep. 2023,
[15] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, ‘‘You only Art. no. 1257947, doi: 10.3389/fpls.2023.1257947.
look once: Unified, real-time object detection,’’ in Proc. IEEE Conf. [35] J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi,
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 779–788, doi: I. Fischer, Z. Wojna, Y. Song, S. Guadarrama, and K. Murphy,
10.1109/CVPR.2016.91. ‘‘Speed/accuracy trade-offs for modern convolutional object detectors,’’ in
[16] J. Redmon and A. Farhadi, ‘‘YOLO9000: Better, faster, stronger,’’ Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017,
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 3296–3297. Accessed: Jun. 12, 2024. [Online]. Available:
Jul. 2017, pp. 6517–6525. Accessed: Apr. 28, 2024. https://openaccess. https://openaccess.thecvf.com/content_cvpr_2017/html/Huang_Speed
thecvf.com/content_cvpr_2017/html/Redmon_YOLO9000_Better_Fas Accuracy_Trade-Offs_for_CVPR_2017_paper.html
ter_CVPR_2017_paper.html [36] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and
[17] J. Redmon and A. Farhadi, ‘‘YOLOv3: An incremental improvement,’’ D. Batra, ‘‘Grad-CAM: Visual explanations from deep networks via
2018, arXiv:1804.02767. gradient-based localization,’’ in Proc. IEEE Int. Conf. Comput. Vis.
[18] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, ‘‘YOLOv4: Optimal (ICCV), Oct. 2017, pp. 618–626. Accessed: Mar. 6, 2024. [Online]. Avail-
speed and accuracy of object detection,’’ 2020, arXiv:2004.10934. able: https://openaccess.thecvf.com/content_iccv_2017/html/Selvaraju_
[19] C. Li, L. Li, H. Jiang, K. Weng, Y. Geng, L. Li, Z. Ke, Q. Li, M. Cheng, Grad-CAM_Visual_Explanations_ICCV_2017_paper.html
W. Nie, Y. Li, B. Zhang, Y. Liang, L. Zhou, X. Xu, X. Chu, X. Wei,
and X. Wei, ‘‘YOLOv6: A single-stage object detection framework for
industrial applications,’’ 2022, arXiv:2209.02976.
[20] C.-Y. Wang, A. Bochkovskiy, and H.-Y.-M. Liao, ‘‘YOLOv7: Trainable
bag-of-freebies sets new state-of-the-art for real-time object detectors,’’ in
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2023,
pp. 7464–7475, doi: 10.1109/cvpr52729.2023.00721.
[21] J. Cao, W. Bao, H. Shang, M. Yuan, and Q. Cheng, ‘‘GCL-YOLO:
A GhostConv-based lightweight Yolo network for UAV small object
detection,’’ Remote Sens., vol. 15, no. 20, p. 4932, Oct. 2023, doi: CHUANYAO LIU is currently pursuing the M.E.
10.3390/rs15204932. degree with the School of Geomatics and Urban
[22] H. Li, L. Liu, J. Du, F. Jiang, F. Guo, Q. Hu, and L. Fan, ‘‘An improved Spatial Informatics, Beijing University of Civil
YOLOv3 for foreign objects detection of transmission lines,’’ IEEE Engineering and Architecture, Beijing, China. His
Access, vol. 10, pp. 45620–45628, 2022, doi: 10.1109/ACCESS.2022. research interests include deep learning and point
3170696. cloud processing.
[23] S. Huang, X. Dong, Y. Wang, and L. Yang, ‘‘Detection of insula-
tor burst position of lightweight YOLOv5,’’ in Proc. 8th Int. Conf.
Comput. Artif. Intell., Mar. 2022, pp. 573–578, doi: 10.1145/3532213.
3532300.
SHUANGFENG WEI was born in Hubei, China, interests include geoinformation for disasters, resilient and risk assessment,
in 1979. He received the Ph.D. degree in pho- and spatial big data and machine learning for spatial and temporal problem
togrammetry and remote sensing from Wuhan solving, particularly in safety operations of smart city.
University, Wuhan, China, in 2007. Since 2007,
he has been with Beijing University of Civil Engi-
neering and Architecture, Beijing, China, where
he is currently an Associate Professor. He has
authored or co-authored about 60 papers. His FAN YU was born in Xiaogan, Hubei, China,
research interests include point cloud processing in 1982. He received the B.S. and M.S. degrees in
and SLAM. photogrammetry and remote sensing from Wuhan
University, in 2004 and 2007, respectively, and the
Ph.D. degree in remote sensing from the Chinese
Academy of Sciences, in 2010.
From July 2010 to 2018, he was an Associate
Researcher and an Academic Secretary with the
Key Laboratory of Geospatial Information Engi-
SHAOBO ZHONG was born in Hubei, China, neering, State Bureau of Surveying and Mapping,
in 1978. He received the Ph.D. degree in car- China Academy of Surveying and Mapping Sciences. Since July 2018,
tography and GIS from the Institute of Remote he is an Associate Professor with the School of Surveying and Mapping
Sensing Applications, Chinese Academy of Sci- and Urban Spatial Information, Beijing University of Civil Engineering and
ences, Beijing, China, in 2006. From 2006 to 2018, Architecture. He has published over 30 SCI/EI journal articles, and hosted
he was an Educator and a Researcher with two National Natural Science Foundation projects. He has rich experience
Tsinghua University, Beijing. He is currently a in the preprocessing of remote sensing images, research on segmentation
Researcher with Beijing Academy of Science and classification algorithms on machine learning/data mining for remote
and Technology, Beijing. He has authored or sensing information extraction.
co-authored more than 140 papers. His research