Abstract

Due to variations in light transmission and wear on the contact head, existing visual-tactile dataset building methods typically require a large amount of real-world data, making the dataset building process time-consuming and labor-intensive. Sim-to-Real learning has been proposed to realize Multi-Source Visual-Tactile Information Understanding (MSVTIU) in simulate and real environment, which can efficiently promote visual-tactile dataset building using simulation method for emerged robotic applications. However, the existing Sim-to-Real learning also requires more than 10,000 real data, while the corresponding data need to be re-collected when the sensor version changes. To address this challenge, we propose a powerful Sim-to-Real transfer for MSVTIU which requires only one single real-world tactile sample. To effectively extract features from the single real tactile sample, a multi-scale vision transformers-based Generative Adversarial Network (GAN) is proposed to address MSVTIU task under extremely limited data. We introduce a novel scale-dependent self-attention mechanism that allows attention layers to adapt their behavior at different stages of the generating process. In addition, we introduced a residual block for capturing contextual information between adjacent scales, which utilizes shortcut connections to fully preserve texture and structure information. We subsequently enhanced the model understandnig of visual-tactile information using elastic transform and adaptive adversarial training strategy, both of which are designed specifically for MSVTIU. Experiments on two public datasets with diverse objects indicate that our Sim-to-Real transfer approach utilizing only one single real-world visual-tactile sample outperforms the state-of-the-art methods that requires tens of thousands of samples.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call