Abstract

Object imaging and recognition under difficult visual conditions is extremely challenging due to the captured low-quality images, and traditional optical-based recognition methods always fail in this task. In this paper, we propose to utilize the visual-microwave image pairs captured by both visual cameras and microwave sensors for imaging and recognition. To address the heavy noises in the low-quality optical images, we retrieve the physically quantitative images from associated scattered field data, and enhance visual features by both optical and retrieval images. We develop a cross-modal Enhanced Attentive Visual-Microwave Fusion (EAVMF) object recognition model to jointly learn the cross-modal generator and multimodal recognizer. In addition, an attention module for the visual subnetwork is utilized to highlight the regions of interest. Two multimodal datasets with synthetic visual-microwave image pairs are built to simulate the difficult visual condition. The numerical results on these datasets demonstrate that: 1) both the multimodal fusion, cross-modal enhancement, and visual attention module can enhance the performance; and 2) compared with existing methods, the proposed EAVMF not only performs better in terms of accuracy but also has good scalability and one-shot learning ability.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call