Abstract

In this paper, we investigate the cross-modal material retrieval problem, which permits the user to submit a multimodal query including tactile and auditory modalities, and retrieve the image results of visual modalities. Since multiple significantly different modalities are involved in this process, we encounter more challenges compared with the existing cross-modal retrieval tasks. Our focus is to learn cross-modal representations when the modalities are significantly different and with minimal supervision. A novelty is that we establish a framework that deals with weakly paired multimodal fusion method for heterogenous tactile and auditory modalities and weakly paired cross-modal transfer for visual modality. A structured dictionary learning method with a low rank and common classifier is developed to obtain the modal-invariant representation. Finally, some cross-modal validations on publicly available data sets are performed to show the advantages of the proposed method. Note to Practitioners —Cross-modal retrieval is an important task for industrial intelligence. In this paper, we establish a framework to effectively solve the cross-modal material retrieval problem. In the developed framework, the user may submit a multimodal query including acceleration and sound about an object, and the system may return the most relevant retrieved images. Such a framework may find extensive applications in many fields, because it can be flexible to deal with a multiple-modal query and uses the minimal category label supervision without the need of strong sample pairing information between modalities. Compared with the previous material analysis systems, this paper goes beyond previously proposed surface material classification approaches as it returns an ordered list of perceptually similar surface materials for a query.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call