Modal-Dependent Retrieval Based on Mid-Level Semantic Enhancement Space

Shunxin Zheng,Huaxiang Zhang,Yudan Qi,Bin Zhang

doi:10.1109/access.2019.2910198

Shunxin Zheng, Huaxiang Zhang + Show 2 more

Open Access

https://doi.org/10.1109/access.2019.2910198

Copy DOI

Abstract

Cross-media retrieval refers to submitting any kind of media type and obtaining similar results in the form of different media types. The existing cross-media retrieval algorithms typically learn a pair of linear (or non-linear) mappings that project image and textual features from their original heterogeneous feature space into a common homogeneous space, all these traditional methods are simply about utilizing the correlation between the original features and the high-level semantic space. At the same time, due to the ”semantic gap” incurred by the extraction of the visual feature, the feature of text mode is often more distinctive than the visual features in terms of their distribution in the semantic space. Based on the above-mentioned problems, we have established a medium semantic enhancement space (MMSES) to enhance the discrimination capability of textual features. First of all, projecting the original features into the medium meaning semantic enhancement space by utilizing linear discriminant analysis, then projecting the features into a high meaning semantic space by utilizing fixed-distance projection, and finally, studying a different mapping matrix in response to different cross-media retrieval tasks during the retrieval process. In this paper, we employ traditional Euclidean distance to measure the similarity between different modalities in a common space. The effectiveness of MMSES was validated through extensive experiments on three datasets, Wikipedia, Pascal Sentence, and INRIA-Websearch.

Full Text