Abstract

Feature pyramid networks and attention mechanisms are the mainstream methods to improve the detection performance of many current models. However, when they are learned jointly, there is a lack of information association between multi-level features. Therefore, this paper proposes a feature pyramid of the multi-level local attention method, dubbed as MLA-Net (Feature Pyramid Network with Multi-Level Local Attention for Object Detection), which aims to establish a correlation mechanism for multi-level local information. First, the original multi-level features are deformed and rectified using the local pixel-rectification module, and global semantic enhancement is achieved through the multi-level spatial-attention module. After that, the original features are further fused through the residual connection to achieve the fusion of contextual features to enhance the feature representation. Extensive ablation experiments were conducted on the MS COCO (Microsoft Common Objects in Context) dataset, and the results demonstrate the effectiveness of the proposed method with a 0.5% enhancement. An improvement of 1.2% was obtained on the PASCAL VOC (Pattern Analysis Statistical Modelling and Computational Learning, Visual Object Classes) dataset, reaching 81.8%, thereby, indicating that the proposed method is robust and can compete with other advanced detection models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call