Shared perception between robotic systems significantly enhances their ability to understand and interact with their environment, leading to improved performance and efficiency in various applications. In this work, we present a novel full-fledged framework for robotic systems to interactively share their visuo-tactile perception for the robust pose estimation of novel objects in dense clutter. This is demonstrated with a two-robot team sharing their visuo-tactile scene representation which then declutters the scene using interactive perception and precisely estimates the 6 Degrees-of-Freedom (DoF) pose and 3 DoF scale of a target unknown object. This is achieved with the Stochastic Translation-Invariant Quaternion Filter (S-TIQF), a novel Bayesian filtering method with robust stochastic optimization for estimating the globally optimal pose of a target object. S-TIQF is also deployed to perform in situ visuo-tactile hand-eye calibration, since shared perception requires accurate extrinsic calibration between the two different sensing modalities, tactile and visual. Finally, we develop a novel active shared visuo-tactile representation and object reconstruction method employing a joint information gain criterion to improve the sample efficiency of the robot actions. To validate the effectiveness of our approach, we perform extensive experiments across standard datasets for pose estimation, as well as real-robot experiments with opaque, transparent and specular objects in randomised clutter settings and comprehensive comparison with other state-of-the-art approaches. Our experiments indicate that our approach outperforms state-of-the-art methods in terms of pose estimation accuracy for dense visual and sparse tactile point clouds.
Read full abstract