Abstract
It is well recognized that the contextual information of surrounding objects is beneficial for object detection. Such contextual information can often be obtained from long-range dependencies. This paper proposes a sparse attention block to capture long-range dependencies in an efficient way. Unlike the conventional non-local block, which generates a dense attention map to characterize the dependency between any two positions of the input feature map, our sparse attention block samples the most representative positions for contextual information aggregation. After searching for local peaks in a heat map of the given input feature map, it adaptively selects a sparse set of positions to represent the relationship between query and key elements. With the obtained sparse positions, our sparse attention block can well model long-range dependencies, and greatly improve the object detection performance at the additional cost of <2% GPU memory and computation of the conventional non-local block. This sparse attention block can be easily plugged into various object detection frameworks, such as Faster R-CNN, RetinaNet and Mask R-CNN. Experiments on COCO benchmark confirm that our sparse attention block can boost the detection accuracy with significant gains ranging from 1.4% to 1.9% and negligible overhead of computation and memory usage.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.