Cross-Modal Material Perception for Novel Objects: A Deep Adversarial Learning Method

Wendong Zheng,Bowen Wang,Huaping Liu,Fuchun Sun

doi:10.1109/tase.2019.2941230

Abstract

To more actively perform fine manipulation tasks in the real world, intelligent robots should be able to understand and communicate the physical attributes of the material during interaction with an object. Tactile and vision are two important sensing modalities in robotic perception system. In this article, we propose a cross-modal material perception framework for recognizing novel objects. Concretely, it first adopts an object-agnostic method to associate information from tactile and visual modalities. It then recognizes a novel object by using its tactile signal to retrieve perceptually similar surface material images through the learned cross-modal correlation. This problem exhibits a challenge because data from visual and tactile modalities are highly heterogeneous and weakly paired. Moreover, the framework should not only consider cross-modal pairwise relevance but also be discriminative and generalized for unseen objects. To this end, we propose a weakly paired cross-modal adversarial learning (WCMAL) model for the visual–tactile cross-modal retrieval, which combines the advantages of deep learning and adversarial learning. In particular, the model fully considers the weak pairing problem between the two modalities. Finally, we conduct verification experiments on a publicly available data set. The results demonstrate the effectiveness of the proposed method. Note to Practitioners— Since cross-modal perception can improve the active operation of automation systems, it is invaluable for industrial intelligence, particularly when only one sensing modality cannot be used or suitable in some applications. In this article, we provide a framework of cross-modal material perception for object recognition using the idea of the cross-modal retrieval. Concretely, we use relevant tactile data of an unknown object to retrieve perceptually similar surface images, which are used to evaluate its material properties. Different from that previous works using tactile information as a complement or alternative to visual information to recognize specific objects, our proposed framework is able to estimate and infer material properties of both seen and unseen objects, which can enhance manipulation systems intelligence and improve the quality of the interaction. In our future works, more modality information will be incorporated to further enhance the cross-modal material perception.

Full Text