Abstract
Pedestrian detection is an important task in many intelligent systems, particularly driver assistance systems. Recent studies on pedestrian detection in infrared (IR) imagery have employed data-driven approaches. However, two problems in deep learning-based detection are the implicit performance and time-consuming training. In this paper, a novel channel expansion technique based on feature fusion is proposed to enhance the IR imagery and accelerate the training process. Besides, a novel background suppression method is proposed to stimulate the attention principle of human vision and shrink the region of detection. A precise fusion algorithm is designed to combine the information from different visual saliency maps in order to reduce the effect of truncation and miss detection. Four different experiments are performed from various perspectives in order to gauge the efficiency of our approach. The experimental results show that the Mean Average Precisions (mAPs) of four different datasets have been increased by 5.22% on average. The results prove that background suppression and suitable feature expansion will accelerate the training process and enhance the performance of IR image-based deep learning models.
Highlights
Pedestrian detection is a vital research topic in the field of computer vision, one of significant theoretical interest, and with various applications
According to Pascal criteria, the detection results can be classified into four situations: True positives (TP), true negatives (TN), false positives (FP), and false negatives (FN)
An enhancement method for pedestrian detection with channel expansion and background suppression approaches has been studied in an IR imagery environment
Summary
Pedestrian detection is a vital research topic in the field of computer vision, one of significant theoretical interest, and with various applications. Kwak et al designed an image feature based on the prior knowledge that pedestrian form is characterized by higher intensity relative to the background [17] This approach fails to produce accurate classification results when a pedestrian’s motion is large [17]. He et al created ResNet by adding a residual learning block, which made the ResNet more effective [20] All of these CNN-based approaches offered increased performance on classification problems—but to solve detection problems, they required sliding window detection to scan all the images, which incurred a prohibitive computational cost. Experiments were carefully designed, and they proved the assumption that appropriate expert-driven features can help with the extraction of CNN features and accelerate the training process of the model.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have