Abstract

Accurately perceiving object surface material is critical for scene understanding and robotic manipulation. However, it is ill-posed because the imaging process entangles material, lighting, and geometry in a complex way. Appearance-based methods cannot disentangle lighting and geometry variance and have difficulties in textureless regions. We propose a novel multimodal fusion method for surface material perception using the depth camera shooting structured laser dots. The captured active infrared image was decomposed into diffusive and dot modalities and their connection with different material optical properties (i.e. reflection and scattering) were revealed separately. The geometry modality, which helps to disentangle material properties from geometry variations, is derived from the rendering equation and calculated based on the depth image obtained from the structured light camera. Further, together with the texture feature learned from the gray modality, a multimodal learning method is proposed for material perception. Experiments on synthesized and captured datasets validate the orthogonality of learned features. The final fusion method achieves 92.5% material accuracy, superior to state-of-the-art appearance-based methods (78.4%).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call