With the increasing number of remote sensing ship images, it’s vitally important to search for the ship objects that users are interested in from the remote sensing image big data. The existing works are focused on single-modality remote sensing ship image processing. But there is no method of retrieving ship images from different remote sensing image modalities. Besides, the models in existing works only output a predicted result without offering a reasonable explanation. In this work, we propose an interpretable fusion siamese network (IFSN) for addressing the multi-modality remote sensing ship image retrieval (MRSSIR). 1) An interpretable attention feature representation module is proposed to generate multiple attention maps and aggregate the filters of the last convolutional layer, which can focus on the ship’s discriminative parts and make each divided convolutional filter group express specific visual information. 2) A multi-modality correlation learning module is proposed to overcome the intra-modality and inter-modality variations by designing some constraints. 3) A discriminative region mining module is proposed to exhaustively explore all the ship’s discriminative parts available for decision-making of the proposed network. We construct a multi-modality remote sensing ship images dataset (MRSSID) to evaluate the performance of the proposed IFSN. The experimental results exhibit that our IFSN outperforms the existing methods in retrieval accuracy and provides reasonable and intuitive interpretations for the retrieval results.
Read full abstract