Abstract

Current mainstream pedestrian detectors tend to profit directly from convolutional neural networks (CNNs) that are designed for image classification. While requiring a large downsampling factor to produce high-level semantic features, CNNs cannot adaptively focus on the useful channels and regions of the feature maps, which limits the accuracy of pedestrian detection. To obtain a higher accuracy, we propose a single-stage pedestrian detector with channel and spatial attentions (CSANet), which can locate useful channels and regions automatically while extracting features. The backbone of CSANet is different from that of mainstream pedestrian detectors, which can effectively highlight the pedestrian-likely regions and suppress the background. Specifically, we model contextual dependencies from channel and spatial dimensions of the feature maps, respectively. The channel attention module can selectively promote CNNs to focus on key channels by integrating associated features. Meantime, the spatial attention module can illuminate semantic pixels by aggregating similar features of all channels. Eventually, the two modules are connected in series to further enhance the representation of feature maps. Experiment results show that CSANet achieves the state-of-the-art performance with $MR^{-2}$ of 3.55% on Caltech dataset and obtains competitive performance on CityPersons dataset while maintaining a high computational efficiency.

Highlights

  • Pedestrian detection plays a critical role in computer vision tasks such as autonomous driving, robotics, and surveillance

  • Zhang et al [8] discussed the effectiveness of Faster R-convolutional neural networks (CNNs) framework in pedestrian detection

  • PROPOSED METHOD we will firstly show the overall framework of CSANet and introduce the mathematical modeling of channel attention module (CAM) and spatial attention module (SAM) separately

Read more

Summary

Introduction

Pedestrian detection plays a critical role in computer vision tasks such as autonomous driving, robotics, and surveillance. Pedestrian detectors have made considerable progress with the revival of deep learning [1], [2]. Current state-of-the-art pedestrian detectors still fall far short from the cognitive levels as fast and accurate as human [3]. Pedestrian detection can be traced back to traditional methods with low-level features, e.g. HOG [4]. The emergence of R-CNN [5] made the two-stage architectures of ‘‘Region Proposal+CNN’’ into an established method in object detection [6], [7]. Zhang et al [8] discussed the effectiveness of Faster R-CNN framework in pedestrian detection

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call