A cross-modal multimedia retrieval method based on semi-supervised learning and category information alignment

Baiqiang Gan,Lixia Pan

doi:10.1109/icise-ie53922.2021.00364

Abstract

With the rapid development of artificial intelligence technology, traditional unimodal media information retrieval can hardly meet users’ retrieval needs for different data, therefore, cross-modal information retrieval is the key to breakthrough and solve the problem, however, the difficult problem faced by cross-modal information retrieval is how to effectively and accurately semantically match heterogeneous data and compare media objects of different modalities in different feature spaces. To address the difficult problem of cross-modal information retrieval, this paper proposes a cross-modal multimedia retrieval method with semi-supervised learning and category information alignment, which aligns different modal information by introducing category information to minimize the discriminative loss in the common representation space and the loss among individual modalities, and then constructs a homogeneous high-level semantic space based on the underlying feature space of different modal objects, and applies the semi-supervised learning method to cross-media retrieval. The semi-supervised learning method is applied to cross-media retrieval to achieve semantic matching of multimedia objects of different modalities. Finally, by comparing this method with traditional cross-media retrieval methods, the experimental results show that this method has greatly improved the accuracy of retrieval results compared with traditional cross-media retrieval techniques.

Full Text