To maintain the safe operation of power distribution network (PDN) equipment, it is important to accurately and promptly identify security risks. However, conventional drone-based object detection methods face challenges due to noise and similarity features in risk targets, as well as limited computing resources of unmanned aerial vehicles (UAVs). To address these challenges, an efficient embedding-based multi-path fusion architecture is proposed. This architecture uses a re-parameterized depthwise block to embed local context information at different scales, enhancing the extraction of tiny features while preserving inference speed. Additionally, a coordinated self-attention module is proposed to reduce computational complexity while maintaining the performance of global information. By fusing fine and coarse feature representations without requiring a lot of computation, this module efficiently learns from both local and global features from images. The goal is to create an efficient multi-path vision transformer (EMPViT) architecture that achieves a balance between accuracy and efficiency. The proposed EMPViT has been evaluated on two different drone image dataset, demonstrating better performance compared to other architectures. Specifically, the EMPViT-S improves the detection mAP by 1.2%, and the inference speed is improved to 1.24 times on average on Drone-PDN dataset. It has achieved the same performance improvement on VisDrone-DET2019 dataset, gaining detection performance by 1.3% and 1.2 times acceleration on average.
Read full abstract