Abstract

Attention mechanisms and Non-Maximum Suppression (NMS) have proven to be effective components in object detection. However, feature fusion of different scales and layers based on a single attention mechanism cannot always yield gratifying performance, and may introduce redundant information that makes the results worse than expected. NMS methods, on the other hand, generally face the single-constant threshold dilemma, namely, a lower threshold leads to the miss of highly overlapped instance objects while a higher one brings in more false positives. Therefore, how to optimize different dimensions of correlation in feature mapping and how to adaptively set the NMS threshold still hinder effective object detection. While independently addressing each will cause suboptimal detection, this paper proposes to feed the informative feature representation from a joint-attention feature fusion network into adaptive NMS for a comprehensive performance enhancement. Specifically, we embed two types of attention modules in a three-level Feature Pyramid Network (FPN): the channel-attention module is adopted for enhanced feature representation by re-evaluating relationships between channels from a global perspective; the position-attention module is used to exploit the correlation between features to discover rich contextual feature information. Furthermore, we develop dual-adaptive NMS to dynamically adjust the suppression thresholds according to instance objects density, namely, the threshold rises as instance objects gather and decays when objects appear sparsely. The proposed method is evaluated on the COCO dataset and extensive experimental results demonstrate its superior performance compared with existing methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call