Abstract
AbstractSemantic scene completion is a crucial end‐to‐end 3D perception task, and the 3D information perception subjects is vital for autonomous driving. This paper presents CASSC, a novel adaptive context‐aware method based on Transformer networks, aimed at realizing camera‐based semantic scene completion algorithms. The key idea is to leverage rich context information from images to obtain pixel‐level label proposals, followed by designing a multiscale fusion mechanism to merge this information and match it with voxel space. A weakly supervised training strategy is proposed to obtain semantic label distribution features from images and introduce an adaptive multiscale fusion module to fuse and adaptively match these features with voxel space. Here, CASSC achieves state‐of‐the‐art performance on the SemanticKITTI dataset and demonstrates excellent performance on the SSC‐Bench dataset. Ablation experiments validate the rationality and effectiveness of our design, and the model and code of CASSC will be open‐sourced on https://github.com/dogooooo/CASSC.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have