Abstract

The development of object detection in infrared images has attracted more attention in recent years. However, there are few studies on multi-scale object detection in infrared street scene images. Additionally, the lack of high-quality infrared datasets hinders research into such algorithms. In order to solve these issues, we firstly make a series of modifications based on Faster Region-Convolutional Neural Network (R-CNN). In this paper, a double-layer region proposal network (RPN) is proposed to predict proposals of different scales on both fine and coarse feature maps. Secondly, a multi-scale pooling module is introduced into the backbone of the network to explore the response of objects on different scales. Furthermore, the inception4 module and the position sensitive region of interest (ROI) align (PSalign) pooling layer are utilized to explore richer features of the objects. Thirdly, this paper proposes instance level data augmentation, which takes into account the imbalance between categories while enlarging dataset. In the training stage, the online hard example mining method is utilized to further improve the robustness of the algorithm in complex environments. The experimental results show that, compared with baseline, our detection method has state-of-the-art performance.

Highlights

  • With the development of infrared sensor technology, object detection in infrared images has attracted more attention in the fields of face recognition and pedestrian detection

  • In order to prove the superiority of our network structure, we did not use Online Hard Example Mining (OHEM) to pick out 256 proposals network structure, we did not use Online Hard Example Mining (OHEM) to pick out 256 proposals generated by a double-layer region proposal network (RPN) pyramid

  • We started from two aspects to solve the problems of multi-scale object detection in infrared street scene images

Read more

Summary

Introduction

With the development of infrared sensor technology, object detection in infrared images has attracted more attention in the fields of face recognition and pedestrian detection. The object detection methods based on deep learning utilize multi-layer convolution networks to extract more abstract semantic information of images. Their performance in complex environments is more robust than that of traditional methods [8,9]. To summarize, these existing data augmentation methods for object detection have the following shortcomings. These existing data augmentation methods for object detection have the following shortcomings These algorithms randomly transform every image without considering the proportions of different objects in the dataset. Our first motivation is to modify the structure of Faster R‐CNN to improve the data augmentation method that can solve the issue of non-uniform class distribution while augmenting performance of multi‐scale object detection in infrared images.

Double-Layer RPN Pyramid
Multi-Scale Pooling with Inception4 Module and PSalign
Instance Level Data Augmentation
Implementation
Implement Details of Instance Level Data Augmentation
Online
Comparative Experiment
Detention Method
Detection Method
False Alarms and Misjudgment
Misjudgments occur under
Findings
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.