An Improved Faster R-CNN Pedestrian Detection Algorithm Based on Feature Fusion and Context Analysis

Sheping Zhai,Susu Dong,Dingrong Shang,Shuhuan Wang

doi:10.1109/access.2020.3012558

Abstract

Considering the multi-scale and occlusion problem of pedestrian detection in natural scenes, we propose an improved Faster R-CNN pedestrian detection algorithm based on feature fusion and context analysis (FCF R-CNN). We design a feature fusion method of progressive cascade on VGG16 network, and add LRN to speed up the convergence of the network. The improved feature extraction network enables our model to generate high-resolution feature maps containing rich, detailed and semantic information. We also adjust the RPN parameters to improve the proposal efficiency. In addition, we add a multi-layer iterative LSTM module to the detection model, which uses LSTM’s memory ability to extract the global context information of the candidate boxes. This method only needs the feature map of the image itself as input, which highlights useful context information and enables the model to generate more accurate candidate boxes containing potential pedestrians. Our method performs better than existing methods in detecting small-size and occluded pedestrians, and has strong robustness in challenging scenes. Our method achieves competitive results in both accuracy and speed on Caltech pedestrian dataset, achieving a LAMR value of 36.75% and a runtime of 0.20 seconds per image. The validity of the algorithm has been proved.

Highlights

Pedestrian detection has gradually become a research hotspot in the field of computer vision due to its wide application in many fields, such as intelligent video monitoring, vehicle assisted driving, intelligent robot and other fields [1]–[6]
We propose an improved algorithm called FCF R-Convolutional Neural Network (CNN) based on a Faster R-CNN [8] algorithm with better speed and accuracy
MULTI-SCALE FEATURE EXTRACTION NETWORK Faster R-CNN only uses the last layer of convolution feature maps, which often results in small-scale pedestrian neglect

Summary

INTRODUCTION

Pedestrian detection has gradually become a research hotspot in the field of computer vision due to its wide application in many fields, such as intelligent video monitoring, vehicle assisted driving, intelligent robot and other fields [1]–[6]. Li et al proposed AC-CNN [29], after the pooling layer of the detector, two sub-networks were introduced to effectively integrate global and local context information into the final detection process. FCF R-CNN Compared with Faster R-CNN, the improvement of our method is to design a multi-scale feature extraction network and a multi-layer LSTM module for global context extraction. A. MULTI-SCALE FEATURE EXTRACTION NETWORK Faster R-CNN only uses the last layer of convolution feature maps, which often results in small-scale pedestrian neglect. MULTI-SCALE FEATURE EXTRACTION NETWORK Faster R-CNN only uses the last layer of convolution feature maps, which often results in small-scale pedestrian neglect To solve this problem, we use the feature fusion method to improve the feature extraction capability of the backbone network for multi-scale pedestrian.

CONTEXT INFORMATION EXTRACTION NETWORK BASED ON MULTI-LAYER LSTM

THE REGION PROPOSAL NETWORK MORE SUITABLE FOR PEDESTRIAN DETECTION

EXPERIMENTS

Findings

CONCLUSION AND FUTURE WORK