Benefitting from the development of pyramidal feature learning, current state-of-the-art multi-scale detection paradigm has become proficient in detecting objects of varying scales. However, feature pyramid network (FPN), in spite of constructing multi-scale features with strong semantics, still suffers from limited performance caused by insufficient detail exploitation, information loss, limited receptive fields and hard proposal assignment, which can be mainly categorized into semantic level and instance level. To address these limitations, this paper analyzes the structural components that inhibit multi-scale feature representation and then presents a multi-stage progressive FPN (ProFPN) along with a novel RoI feature representation method called soft proposal assignment. In the semantic level, the bottom-up interaction module is first proposed to address to insufficient exploitation of high resolution features. In the bottom-up interaction module, global context attention blocks are utilized to interact adjacent-level features with detail information in a bottom-up progressive manner. After that, the top-down transfer module is designed to mitigate semantic information loss of high-level features. In the top-down transfer module, multi-branch asymmetric dilated blocks are adopted in a top-down progressive manner, which expands receptive fields to capture more object poses. In the instance level, to overcome the hard assignment of object proposals, a nonparametric strategy named soft proposal assignment is proposed to leverage the scale of each object proposal to generate dynamic weights for RoI features from adjacent levels. Comprehensive experiments conducted on MS COCO dataset demonstrate the superiority of ProFPN. By adding negligible extra FLOPs, the proposed ProFPN outperforms most pyramid-based methods. Moreover, due to the design of inherited feature utilization in ProFPN, transformer-based detectors have witnessed a substantial increase in detecting small objects while simultaneously achieving significant reductions in FLOPs. The source code of the proposed method is available at https://github.com/GingerCohle/ProFPN.
Read full abstract