AbstractRecently, the world has witnessed a great increase in drone applications and missions. Drones must be detected quickly, effectively, and precisely when they are being handled illegally. Vision-based anti-drone systems provide an efficient performance compared to radar- and acoustic-based systems. The effectiveness of drone detection is affected by a number of issues, including the drone’s small size, conflicts with other objects, and noisy backgrounds. This paper employs enlarging the effective receptive field (ERF) of feature maps generated from the YOLOv6 backbone. First, RepLKNet is used as the backbone of YOLOv6, which deploys large kernels with depth-wise convolution. Then, to get beyond RepLKNet’s large inference time, a novel LERFNet is implemented. LERFNet uses dilated convolution in addition to large kernels to enlarge the ERF and overcome each other’s problems. The linear spatial-channel attention module (LAM) is used to give more attention to the most informative pixels and high feature channels. LERFNet produces output feature maps with a large ERF and high shape bias to enhance the detection of various drone sizes in complex scenes. The RepLKNet and LERFNet backbones for Tiny-YOLOv6, Tiny-YOLOv6, YOLOv5s, and Tiny-YOLOv7 are compared. In comparison to the aforementioned techniques, the suggested model’s results show a greater balance between accuracy and speed. LERFNet increases the MAP by $$2.8\%$$ 2.8 % , while significantly reducing the GFLOPs and parameter numbers when compared to the original backbone of YOLOv6.
Read full abstract