Cross-modal Similarity Research Articles

One key issue in managing a large scale 3D shape dataset is to identify an effective way to retrieve a shape-of-interest. The sketch-based query, which enjoys the flexibility in representing the user’s intention, has received growing interests in recent years due to the popularization of the touchscreen technology. Essentially, the sketch depicts an abstraction of a shape in a certain view while the shape contains the full 3D information. Matching between them is a cross-modality retrieval problem, and the state-of-the-art solution is to project the sketch and the 3D shape into a common space with which the cross-modality similarity can be calculated by the feature similarity/distance within. However, for a given query, only part of the viewpoints of the 3D shape is representative. Thus, blindly projecting a 3D shape into a feature vector without considering what is the query will inevitably bring query-unrepresentative information. To handle this issue, in this work we propose a Deep Point-to-Subspace Metric Learning (DPSML) framework to project a sketch into a feature vector and a 3D shape into a subspace spanned by a few selected basis feature vectors. The similarity between them is defined as the distance between the query feature vector and its closest point in the subspace by solving an optimization problem on the fly. Note that, the closest point is query-adaptive and can reflect the viewpoint information that is representative to the given query. To efficiently learn such a deep model, we formulate it as a classification problem with a special classifier design. To reduce the redundancy of 3D shapes, we also introduce a Representative-View Selection (RVS) module to select the most representative views of a 3D shape. By conducting extensive experiments on various datasets, we show that the proposed method can achieve superior performance over its competitive baseline methods and attain the state-of-the-art performance.

Multimodal hashing approaches have gained great success on large-scale cross-modal similarity search applications, due to their appealing computation and storage efficiency. However, it is still a challenge work to design binary codes to represent the original features with good performance in an unsupervised manner. We argue that there are some limitations that need to be further considered for unsupervised multimodal hashing: 1) most existing methods drop the discrete constraints to simplify the optimization, which will cause large quantization error; 2) many methods are sensitive to outliers and noises since they use $\ell _{2}$ -norm in their objective functions which can amplify the errors; and 3) the weight of each modality, which greatly influences the retrieval performance, is manually or empirically determined and may not fully fit the specific training set. The above limitations may significantly degrade the retrieval accuracy of unsupervised multimodal hashing methods. To address these problems, in this paper, a novel hashing model is proposed to efficiently learn robust discrete binary codes, which is referred as Robust and Flexible Discrete Hashing (RFDH). In the proposed RFDH model, binary codes are directly learned based on discrete matrix decomposition, so that the large quantization error caused by relaxation is avoided. Moreover, the $\ell _{2,1}$ -norm is used in the objective function to improve the robustness, such that the learned model is not sensitive to data outliers and noises. In addition, the weight of each modality is adaptively adjusted according to training data. Hence the important modality will get large weights during the hash learning procedure. Owing to above merits of RFDH, it can generate more effective hash codes. Besides, we introduce two kinds of hash function learning methods to project unseen instances into hash codes. Extensive experiments on several well-known large databases demonstrate superior performance of the proposed hash model over most state-of-the-art unsupervised multimodal hashing methods.

Cross-modal Similarity Research Articles

Related Topics

Articles published on Cross-modal Similarity

Cluster-wise unsupervised hashing for cross-modal similarity search

IMG-Net: inner-cross-modal attentional multigranular network for description-based person re-identification

Semantic-rebased cross-modal hashing for scalable unsupervised text-visual retrieval

Deep feature learning with attributes for cross-modality person re-identification

Joint and individual matrix factorization hashing for large-scale cross-modal retrieval

Abstraction and Association: Cross-Modal Retrieval Based on Consistency between Semantic Structures

SMAN: Stacked Multimodal Attention Network for Cross-Modal Image–Text Retrieval

Deep Cross-Modal Image–Voice Retrieval in Remote Sensing

RGB-IR Person Re-identification by Cross-Modality Similarity Preservation

Multi-Level Correlation Adversarial Hashing for Cross-Modal Retrieval

Hetero-Center loss for cross-modality person Re-identification

Deep point-to-subspace metric learning for sketch-based 3D shape retrieval

Discriminative Supervised Hashing for Cross-Modal Similarity Search

A model of synesthetic metaphor interpretation based on cross-modality similarity

Discrete Latent Factor Model for Cross-Modal Hashing.

Deep Semantic-Preserving Ordinal Hashing for Cross-Modal Similarity Search.

Robust and Flexible Discrete Hashing for Cross-Modal Similarity Search

Label Consistent Matrix Factorization Hashing for Large-Scale Cross-Modal Similarity Search.

Modality-specific Cross-modal Similarity Measurement with Recurrent Attention Network.

Shared Predictive Cross-Modal Deep Quantization.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Cross-modal Similarity Research Articles

Related Topics

Articles published on Cross-modal Similarity

Cluster-wise unsupervised hashing for cross-modal similarity search

IMG-Net: inner-cross-modal attentional multigranular network for description-based person re-identification

Semantic-rebased cross-modal hashing for scalable unsupervised text-visual retrieval

Deep feature learning with attributes for cross-modality person re-identification

Joint and individual matrix factorization hashing for large-scale cross-modal retrieval

Abstraction and Association: Cross-Modal Retrieval Based on Consistency between Semantic Structures

SMAN: Stacked Multimodal Attention Network for Cross-Modal Image–Text Retrieval

Deep Cross-Modal Image–Voice Retrieval in Remote Sensing

RGB-IR Person Re-identification by Cross-Modality Similarity Preservation

Multi-Level Correlation Adversarial Hashing for Cross-Modal Retrieval

Hetero-Center loss for cross-modality person Re-identification

Deep point-to-subspace metric learning for sketch-based 3D shape retrieval

Discriminative Supervised Hashing for Cross-Modal Similarity Search

A model of synesthetic metaphor interpretation based on cross-modality similarity

Discrete Latent Factor Model for Cross-Modal Hashing.

Deep Semantic-Preserving Ordinal Hashing for Cross-Modal Similarity Search.

Robust and Flexible Discrete Hashing for Cross-Modal Similarity Search

Label Consistent Matrix Factorization Hashing for Large-Scale Cross-Modal Similarity Search.

Modality-specific Cross-modal Similarity Measurement with Recurrent Attention Network.

Shared Predictive Cross-Modal Deep Quantization.