Abstract
Multimodal metric learning aims to transform heterogeneous data into a common subspace where cross-modal similarity computing can be directly performed and has received much attention in recent years. Typically, the existing methods are designed for nonhierarchical labeled data. Such methods fail to exploit the intercategory correlations in the label hierarchy and, therefore, cannot achieve optimal performance on hierarchical labeled data. To address this problem, we propose a novel metric learning method for hierarchical labeled multimodal data, named deep hierarchical multimodal metric learning (DHMML). It learns the multilayer representations for each modality by establishing a layer-specific network corresponding to each layer in the label hierarchy. In particular, a multilayer classification mechanism is introduced to enable the layerwise representations to not only preserve the semantic similarities within each layer, but also retain the intercategory correlations across different layers. In addition, an adversarial learning mechanism is proposed to bridge the cross-modality gap by producing indistinguishable features for different modalities. Through integration of the multilayer classification and adversarial learning mechanisms, DHMML can obtain hierarchical discriminative modality-invariant representations for multimodal data. Experiments on two benchmark datasets are used to demonstrate the superiority of the proposed DHMML method over several state-of-the-art methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Neural Networks and Learning Systems
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.