Abstract

Feature pyramid networks and attention mechanisms are the mainstream methods to improve the detection performance of many current models. However, when they are learned jointly, there is a lack of information association between multi-level features. Therefore, this paper proposes a feature pyramid of the multi-level local attention method, dubbed as MLA-Net (Feature Pyramid Network with Multi-Level Local Attention for Object Detection), which aims to establish a correlation mechanism for multi-level local information. First, the original multi-level features are deformed and rectified using the local pixel-rectification module, and global semantic enhancement is achieved through the multi-level spatial-attention module. After that, the original features are further fused through the residual connection to achieve the fusion of contextual features to enhance the feature representation. Extensive ablation experiments were conducted on the MS COCO (Microsoft Common Objects in Context) dataset, and the results demonstrate the effectiveness of the proposed method with a 0.5% enhancement. An improvement of 1.2% was obtained on the PASCAL VOC (Pattern Analysis Statistical Modelling and Computational Learning, Visual Object Classes) dataset, reaching 81.8%, thereby, indicating that the proposed method is robust and can compete with other advanced detection models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.