Abstract
The decoupled head for classification and localization have been proven powerful in the most of one-stage and two-stage detectors. However, most object detection algorithm share a Feature Pyramid Networks. We perform a thorough analysis about the effectiveness of Feature Pyramid Networks for these two tasks. The decoupled feature pyramid network performs better than the shared network. Going a step further, we found that the two tasks have different preferences for feature pyramid networks. For higher accuracy, we propose a Scene Parsing Pyramid Network for Classification and a Feature Pyramid Transformer Network for Localization. Scene Parsing Pyramid Network exploit the capability of global context information by different region based context aggregation through pyramid pooling module and pyramid attention feature extraction module. Feature Pyramid Transformer Network can capture the suitable contexts of objects residing in different scales. We evaluate our double Feature Pyramid Networks feature pyramid network in the object detection task by integrating it into the FCOS algorithm. The modified algorithm outperforms previous state-of-the-art feature pyramid based methods with a clear margin on both MS-COCO 2017 validation and test datasets.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have