Rethinking scene representation: A saliency-driven hierarchical multi-scale resampling for RGB-D scene point cloud in robotic applications

Xurui Li,Guangshuai Liu,Si Sun,Bailin Li,Wenyu Yi

doi:10.1016/j.eswa.2023.122881

Abstract

Effective three-dimensional (3D) scene representation for grasping is significant in smart manufacturing and industrial applications. Serving as a foundational element for robot manipulation, the desired 3D scene representation should encapsulate critical high-level properties. Nonetheless, little attention has been paid to the impact of the object-aware sampling strategy on generating valid grasps. Although pleasing to the eye, natural point cloud mapping is not always appropriate for real-world robotic applications. We introduce a point cloud framework called saliency-driven hierarchical multi-scale resampling: a 3D stereoscopic scene representation that recognizes and tracks novel objects while predicting their saliency in robot interactions. In particular, the new scene representation captures object-level features, which have long been argued as crucial to human scene understanding. Through this approach, we have discovered that our method yields favorable results in terms of feature persistence and resistance to uneven distribution of input scene point clouds, even when the computational time is comparable or slightly increased. Real-world experiments were also conducted to validate the proposed work’s performance to achieve more accurate grasp estimates in generic object and industrial part grasp tasks.

Full Text