Abstract

Considering the multi-scale and occlusion problem of pedestrian detection in natural scenes, we propose an improved Faster R-CNN pedestrian detection algorithm based on feature fusion and context analysis (FCF R-CNN). We design a feature fusion method of progressive cascade on VGG16 network, and add LRN to speed up the convergence of the network. The improved feature extraction network enables our model to generate high-resolution feature maps containing rich, detailed and semantic information. We also adjust the RPN parameters to improve the proposal efficiency. In addition, we add a multi-layer iterative LSTM module to the detection model, which uses LSTM’s memory ability to extract the global context information of the candidate boxes. This method only needs the feature map of the image itself as input, which highlights useful context information and enables the model to generate more accurate candidate boxes containing potential pedestrians. Our method performs better than existing methods in detecting small-size and occluded pedestrians, and has strong robustness in challenging scenes. Our method achieves competitive results in both accuracy and speed on Caltech pedestrian dataset, achieving a LAMR value of 36.75% and a runtime of 0.20 seconds per image. The validity of the algorithm has been proved.

Highlights

  • Pedestrian detection has gradually become a research hotspot in the field of computer vision due to its wide application in many fields, such as intelligent video monitoring, vehicle assisted driving, intelligent robot and other fields [1]–[6]

  • We propose an improved algorithm called FCF R-Convolutional Neural Network (CNN) based on a Faster R-CNN [8] algorithm with better speed and accuracy

  • MULTI-SCALE FEATURE EXTRACTION NETWORK Faster R-CNN only uses the last layer of convolution feature maps, which often results in small-scale pedestrian neglect

Read more

Summary

INTRODUCTION

Pedestrian detection has gradually become a research hotspot in the field of computer vision due to its wide application in many fields, such as intelligent video monitoring, vehicle assisted driving, intelligent robot and other fields [1]–[6]. Li et al proposed AC-CNN [29], after the pooling layer of the detector, two sub-networks were introduced to effectively integrate global and local context information into the final detection process. FCF R-CNN Compared with Faster R-CNN, the improvement of our method is to design a multi-scale feature extraction network and a multi-layer LSTM module for global context extraction. A. MULTI-SCALE FEATURE EXTRACTION NETWORK Faster R-CNN only uses the last layer of convolution feature maps, which often results in small-scale pedestrian neglect. MULTI-SCALE FEATURE EXTRACTION NETWORK Faster R-CNN only uses the last layer of convolution feature maps, which often results in small-scale pedestrian neglect To solve this problem, we use the feature fusion method to improve the feature extraction capability of the backbone network for multi-scale pedestrian.

CONTEXT INFORMATION EXTRACTION NETWORK BASED ON MULTI-LAYER LSTM
THE REGION PROPOSAL NETWORK MORE SUITABLE FOR PEDESTRIAN DETECTION
EXPERIMENTS
Findings
CONCLUSION AND FUTURE WORK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call