Abstract

Cross-modal retrieval has become a topic of popularity, since multi-data is heterogeneous and the similarities between different forms of information are worthy of attention. Traditional single-modal methods reconstruct the original information and lack of considering the semantic similarity between different data. In this work, a cross-modal semantic autoencoder with embedding consensus (CSAEC) is proposed, mapping the original data to a low-dimensional shared space to retain semantic information. Considering the similarity between the modalities, an automatic encoder is utilized to associate the feature projection to the semantic code vector. In addition, regularization and sparse constraints are applied to low-dimensional matrices to balance reconstruction errors. The high dimensional data is transformed into semantic code vector. Different models are constrained by parameters to achieve denoising. The experiments on four multi-modal data sets show that the query results are improved and effective cross-modal retrieval is achieved. Further, CSAEC can also be applied to fields related to computer and network such as deep and subspace learning. The model breaks through the obstacles in traditional methods, using deep learning methods innovatively to convert multi-modal data into abstract expression, which can get better accuracy and achieve better results in recognition.

Highlights

  • Cross-modal retrieval has become a topic of popularity, since multi-data is heterogeneous and the similarities between different forms of information are worthy of attention

  • Cross-modal information retrieval has become a topic of popularity, and methods have emerged and developed rapidly with the goal of effectively retrieving different information patterns, such as retrieving parts of images with ­texts[2,3]

  • To perform cross-modal retrieval, the key issue is to consider the semantic similarity between different forms of data

Read more

Summary

Introduction

Cross-modal retrieval has become a topic of popularity, since multi-data is heterogeneous and the similarities between different forms of information are worthy of attention. Traditional singlemodal methods reconstruct the original information and lack of considering the semantic similarity between different data. A cross-modal semantic autoencoder with embedding consensus (CSAEC) is proposed, mapping the original data to a low-dimensional shared space to retain semantic information. To obtain good retrieval results, embedding methods are used to retain both semantic and original feature i­nformation[14]. To solve the above problems and achieve an efficient information retrieval, we propose a learning method called cross-modal semantic autoencoder with embedding consensus (CSAEC). The paired image and text data are embedded and mapped into a unified space, called mapping consensus, while retaining the original feature information and semantic information.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call