Abstract
Context modeling presents a pioneering approach for confusing object detection. Although FPN has provided features for multi-scale objects, the feature captures limited feature for spatial context and a little feature for semantic context. In this work, we exploit an end-to-end Dilated and Deformable Feature Pyramid Network, namely DDFPN, to jointly extract spatial and semantic context. For the spatial context, we present Dilated and Deformable Convolution (DDC) to generate a more flexible receptive field than the conventional convolution of FPN. We design a Multi-scale DDC module to learn features for the various deformable objects. For the semantic context, we notice semantic context can be extracted from both features and predictions, and we design two modules to estimate two context relationships from them. The Cross Feature Correlation (CFC) module can estimate the contextual attention from other features. The Co-occurrence Inference (CI) module can learn the co-occurrence features from other predictions. Our network can be applied to various baselines of the FPN family and has similar FLOPs, parameters, and inference speed with these baselines. On MSCOCO minival and test-dev datasets, experiments show that our DDFPN is consistently better than various baselines, including RetinaNet, Faster R-CNN, Mask R-CNN, and Cascade R-CNN. Ablation exemplars show that our contexts are complementary to detect various confusing objects.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.