Abstract

As a general characteristic observed in the real-world datasets, multimodal data are usually partially associated, which comprise the commonly shared information across modalities (i.e., modality-shared information) and the specific information only exists in a single modality (i.e., modality-specific information). Cross-modal retrieval methods typically use these information in multimodal data as a whole and project them into a common representation space to calculate the similarity measure. In fact, only modality-shared information can be well aligned in the learning of common representations, whereas modality-specific information usually brings about interference term and decreases the performance of cross-modal retrieval. The explicit distinction and utilization of these two kinds of multimodal information are important to cross-modal retrieval, but rarely studied in previous research. In this article, we explicitly distinguish and utilize modality-shared and modality-specific features for learning better common representations, and propose an orthogonal subspace decomposition method for cross-modal retrieval, named orthogonal subspace decomposition method. Specifically, we introduce a structure preservation loss to ensure modality-shared information to be well preserved, and optimize the intramodal discrimination loss and intermodal invariance loss to learn the semantic discriminative features for cross-modal retrieval. We conduct comprehensive experiments on four widely used benchmark datasets, and the experimental results demonstrate the effectiveness of our proposed method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.