Aerial image target detection is essential for urban planning, traffic monitoring, and disaster assessment. However, existing detection algorithms struggle with small target recognition and accuracy in complex environments. To address this issue, this paper proposes an improved model based on YOLOv8, named MPE-YOLO. Initially, a multilevel feature integrator (MFI) module is employed to enhance the representation of small target features, which meticulously moderates information loss during the feature fusion process. For the backbone network of the model, a perception enhancement convolution (PEC) module is introduced to replace traditional convolutional layers, thereby expanding the network’s fine-grained feature processing capability. Furthermore, an enhanced scope-C2f (ES-C2f) module is designed, utilizing channel expansion and stacking of multiscale convolutional kernels to enhance the network’s ability to capture small target details. After a series of experiments on the VisDrone, RSOD, and AI-TOD datasets, the model has not only demonstrated superior performance in aerial image detection tasks compared to existing advanced algorithms but also achieved a lightweight model structure. The experimental results demonstrate the potential of MPE-YOLO in enhancing the accuracy and operational efficiency of aerial target detection. Code will be available online (https://github.com/zhanderen/MPE-YOLO).
Read full abstract