Abstract
We introduce RGB road scene material segmentation, i.e., per-pixel segmentation of materials in real-world driving views with pure RGB images, as a novel computer vision task by building a benchmark dataset and by deriving a new method. Our dataset, KITTI-Materials, is based on the well-established KITTI dataset and consists of 1000 frames covering 24 different road scenes of urban/suburban landscapes, carefully annotated with one of 20 material categories for every pixel. It is the first dataset for RGB material segmentation in real driving scenes. Through careful analysis of KITTI-Materials, we identify the extraction and fusion of texture and image context as the key to accurate modeling of road scene material appearance. For this, we introduce Road scene Material Segmentation Network (RMSNet) as a baseline method for this challenging task. RMSNet encodes multi-scale hierarchical features with efficient Transformer layers. We construct the decoder of RMSNet based on a novel efficient self-attention model, which we refer to as SAMixer which adaptively fuses texture and context cues across multiple feature levels. Extensive experiments on KITTI-Materials validate the effectiveness of our RMSNet. We believe our work lays a solid foundation for further studies on RGB road scene material segmentation.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.