Abstract

Object detection plays an important role in autonomous driving, disaster rescue, robot navigation, intelligent video surveillance, and many other fields. Nonetheless, visible images are poor under weak illumination conditions, and thermal infrared images are noisy and have low resolution. Consequently, neither of these two data sources yields satisfactory results when used alone. While some scholars have combined visible and thermal images for object detection, most did not consider the illumination conditions and the different contributions of diverse data sources to the results. In addition, few studies have made use of the temperature characteristics of thermal images. Therefore, in the present study, visible and thermal images are utilized as the dataset, and RetinaNet is used as the baseline to fuse features from different data sources for object detection. Moreover, a dynamic weight fusion method, which is based on channel attention according to different illumination conditions, is used in the fusion component, and the channel attention and a priori temperature mask (CAPTM) module is proposed; the CAPTM can be applied to a deep learning network as a priori knowledge and maximizes the advantage of temperature information from thermal images. The main innovations of the present research include the following: (1) the consideration of different illumination conditions and the use of different fusion parameters for different conditions in the feature fusion of visible and thermal images; (2) the dynamic fusion of different data sources in the feature fusion of visible and thermal images; (3) the use of temperature information as a priori knowledge (CAPTM) in feature extraction. To a certain extent, the proposed methods improve the accuracy of object detection at night or under other weak illumination conditions and with a single data source. Compared with the state-of-the-art (SOTA) method, the proposed method is found to achieve superior detection accuracy with an overall mean average precision (mAP) improvement of 0.69%, including an AP improvement of 2.55% for the detection of the Person category. The results demonstrate the effectiveness of the research methods for object detection, especially temperature information-rich object detection.

Highlights

  • Object detection is a popular direction in computer vision and digital image processing, and is widely used in many fields such as autonomous driving, disaster rescue, robot navigation, intelligent video surveillance, etc

  • As revealed by the table, the mean average precision was the highest when the channel attention and a priori temperature mask (CAPTM) was applied to the first three layers of ResNet; the accuracy decreased after the CAPTM was applied to all

  • The temperature maskwhen is more effective for only primarylayers object detection, The performance of the module used for different of ResNet which is prone to false recognition if applied to deeper layers

Read more

Summary

Introduction

Object detection is a popular direction in computer vision and digital image processing, and is widely used in many fields such as autonomous driving, disaster rescue, robot navigation, intelligent video surveillance, etc. From the comprehensive perspective of international-related research, most object detection studies are based on visible images. For images taken at nighttime or those with insufficient illumination conditions, it is difficult for visible light-based object detection methods to achieve the expected results. 2021, 13, 3656 compensate for the shortcomings of visible imaging, thermal images have some defects such as high noise, low contrast, non-uniformity, and poor spatial resolution [1], so the results are not ideal when applied to target detection alone. Because individual objects are represented differently in thermal infrared images, the temperature information contained in thermal infrared images can be applied as a priori knowledge for target detection

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call