Scale-Sensitive Feature Reassembly Network for Pedestrian Detection

Xiaoting Yang,Qiong Liu

doi:10.3390/s21124189

Abstract

Serious scale variation is a key challenge in pedestrian detection. Most works typically employ a feature pyramid network to detect objects at diverse scales. Such a method suffers from information loss during channel unification. Inadequate sampling of the backbone network also affects the power of pyramidal features. Moreover, an arbitrary RoI (region of interest) allocation scheme of these detectors incurs coarse RoI representation, which becomes worse under the dilemma of small pedestrian relative scale (PRS). In this paper, we propose a novel scale-sensitive feature reassembly network (SSNet) for pedestrian detection in road scenes. Specifically, a multi-parallel branch sampling module is devised with flexible receptive fields and an adjustable anchor stride to improve the sensitivity to pedestrians imaged at multiple scales. Meanwhile, a context enhancement fusion module is also proposed to alleviate information loss by injecting various spatial context information into the original features. For more accurate prediction, an adaptive reassembly strategy is designed to obtain recognizable RoI features in the proposal refinement stage. Extensive experiments are conducted on CityPersons and Caltech datasets to demonstrate the effectiveness of our method. The detection results show that our SSNet surpasses the baseline method significantly by integrating lightweight modules and achieves competitive performance with other methods without bells and whistles.

Highlights

IntroductionPublisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations
We find that when the value of λ changes, the performance of context enhancement fusion (CEF) stabilizes at 14.1% miss rate (MR)− 2 and shows no more improvement
We investigate the effect of channel-aware fusion (CAF), which is inspired by SENet [44] but with a different goal of improving RoI features according to channel importance

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. Pedestrian detection aims to predict the position coordinates of all pedestrian instances in images or videos. It is a critical problem in computer vision field with many realworld applications, such as autonomous driving, intelligent surveillance, and robotics. In academic fields, pedestrian detection is a fundamental component for research hotspots, including person search [1], object tracking [2], and human pose estimation [3,4]

Methods

Results

Conclusion