Abstract

Small object detection is a fundamental and challenging topic in the computer vision community. To detect small objects in images, several methods rely on feature pyramid networks (FPN), which can alleviate the conflict between resolution and semantic information. However, the FPN-based methods also have limitations. First, existing methods only focus only on regions with close spatial distance, hindering the effectiveness of long-range interactions. Second, element-wise addition ignores the different perceptive fields of the two feature layers, thus causing higher-level features to introduce noise to the lower-level features. To address these problems, we propose a cross-layer attention (CLA) block as a generic block for capturing long-range dependencies and reducing noise from high-level features. Specifically, the CLA block performs feature fusion by factoring in both the channel and spatial dimensions, which provides a reliable way of fusing the features from different layers. Because CLA is a lightweight and general block, it can be plugged into most feature fusion frameworks. On the COCO 2017 dataset, we validated the CLA block by plugging it into several state-of-the-art FPN-based detectors. Experiments show that our approach achieves consistent improvements in both object detection and instance segmentation, which demonstrates the effectiveness of our approach.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call