Semantic Scene Completion Research Articles

Numerous studies have investigated the pivotal role of reliable 3D volume representation in scene perception tasks, such as multi-view stereo (MVS) and semantic scene completion (SSC). They typically construct 3D probability volumes directly with geometric correspondence, attempting to fully address the scene perception tasks in a single forward pass. However, such a single-step solution makes it hard to learn accurate and convincing volumetric probability, especially in challenging regions like unexpected occlusions and complicated light reflections. Therefore, this paper proposes to decompose the complicated 3D volume representation learning into a sequence of generative steps to facilitate fine and reliable scene perception. Considering the recent advances achieved by strong generative diffusion models, we introduce a multi-step learning framework, dubbed as VPD, dedicated to progressively refining the Volumetric Probability in a Diffusion process. Specifically, we first build a coarse probability volume from input images with the off-the-shelf scene perception baselines, which is then conditioned as the basic geometry prior before being fed into a 3D diffusion UNet, to progressively achieve accurate probability distribution modeling. To handle the corner cases in challenging areas, a Confidence-Aware Contextual Collaboration (CACC) module is developed to correct the uncertain regions for reliable volumetric learning based on multi-scale contextual contents. Moreover, an Online Filtering (OF) strategy is designed to maintain representation consistency for stable diffusion sampling. Extensive experiments are conducted on scene perception tasks including multi-view stereo (MVS) and semantic scene completion (SSC), to validate the efficacy of our method in learning reliable volumetric representations. Notably, for the SSC task, our work stands out as the first to surpass LiDAR-based methods on the SemanticKITTI dataset.

Read full abstract

Semantic scene completion (SSC) aims to predict the semantic occupancy of each voxel in the entire 3D scene from limited observations, which is an emerging and critical task for autonomous driving. Recently, many studies have turned to camera-based SSC solutions due to the richer visual cues and cost-effectiveness of cameras. However, existing methods usually rely on sophisticated and heavy 3D models to process the lifted 3D features directly, which are not discriminative enough for clear segmentation boundaries. In this paper, we adopt the dense-sparse-dense design and propose a one-stage camera-based SSC framework, termed SGN, to propagate semantics from the semantic-aware seed voxels to the whole scene based on spatial geometry cues. Firstly, to exploit depth-aware context and dynamically select sparse seed voxels, we redesign the sparse voxel proposal network to process points generated by depth prediction directly with the coarse-to-fine paradigm. Furthermore, by designing hybrid guidance (sparse semantic and geometry guidance) and effective voxel aggregation for spatial geometry cues, we enhance the feature separation between different categories and expedite the convergence of semantic propagation. Finally, we devise the multi-scale semantic propagation module for flexible receptive fields while reducing the computation resources. Extensive experimental results on the SemanticKITTI and SSCBench-KITTI-360 datasets demonstrate the superiority of our SGN over existing state-of-the-art methods. And even our lightweight version SGN-L achieves notable scores of 14.80% mIoU and 45.45% IoU on SeamnticKITTI validation with only 12.5 M parameters and 7.16 G training memory. Code is available at https://github.com/Jieqianyu/SGN.

Read full abstract

Semantic Scene Completion Research Articles

Related Topics

Articles published on Semantic Scene Completion

Semantic Scene Completion in Autonomous Driving: A Two-Stream Multi-Vehicle Collaboration Approach.

SSR-2D: Semantic 3D Scene Reconstruction From 2D Images.

CASSC: Context‐aware method for depth guided semantic scene completion

AEFF-SSC: an attention-enhanced feature fusion for 3D semantic scene completion

2D Semantic-Guided Semantic Scene Completion

Geometry-semantic aware for monocular 3D Semantic Scene Completion

Learning accurate monocular 3D voxel representation via bilateral voxel transformer

MRFTrans: Multimodal Representation Fusion Transformer for monocular 3D semantic scene completion

One at a Time: Progressive Multi-Step Volumetric Probability Learning for Reliable 3D Scene Perception

H2GFormer: Horizontal-to-Global Voxel Transformer for 3D Semantic Scene Completion

Semantic Complete Scene Forecasting from a 4D Dynamic Point Cloud Sequence

Semantic Scene Completion With 2D and 3D Feature Fusion

Camera-Based 3D Semantic Scene Completion With Sparse Guidance Network.

Object-Aware Semantic Scene Completion Through Attention-Based Feature Fusion and Voxel-Points Representation

Multi-modal fusion architecture search for camera-based semantic scene completion

CasFusionNet: A Cascaded Network for Point Cloud Semantic Scene Completion by Dense Feature Fusion

From Front to Rear: 3D Semantic Scene Completion Through Planar Convolution and Attention-Based Network

Semantic Scene Completion Using Local Deep Implicit Functions on LiDAR Data.

MotionSC: Data Set and Network for Real-Time Semantic Mapping in Dynamic Environments

FFNet: Frequency Fusion Network for Semantic Scene Completion

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Semantic Scene Completion Research Articles

Related Topics

Articles published on Semantic Scene Completion

Semantic Scene Completion in Autonomous Driving: A Two-Stream Multi-Vehicle Collaboration Approach.

SSR-2D: Semantic 3D Scene Reconstruction From 2D Images.

CASSC: Context‐aware method for depth guided semantic scene completion

AEFF-SSC: an attention-enhanced feature fusion for 3D semantic scene completion

2D Semantic-Guided Semantic Scene Completion

Geometry-semantic aware for monocular 3D Semantic Scene Completion

Learning accurate monocular 3D voxel representation via bilateral voxel transformer

MRFTrans: Multimodal Representation Fusion Transformer for monocular 3D semantic scene completion

One at a Time: Progressive Multi-Step Volumetric Probability Learning for Reliable 3D Scene Perception

H2GFormer: Horizontal-to-Global Voxel Transformer for 3D Semantic Scene Completion

Semantic Complete Scene Forecasting from a 4D Dynamic Point Cloud Sequence

Semantic Scene Completion With 2D and 3D Feature Fusion

Camera-Based 3D Semantic Scene Completion With Sparse Guidance Network.

Object-Aware Semantic Scene Completion Through Attention-Based Feature Fusion and Voxel-Points Representation

Multi-modal fusion architecture search for camera-based semantic scene completion

CasFusionNet: A Cascaded Network for Point Cloud Semantic Scene Completion by Dense Feature Fusion

From Front to Rear: 3D Semantic Scene Completion Through Planar Convolution and Attention-Based Network

Semantic Scene Completion Using Local Deep Implicit Functions on LiDAR Data.

MotionSC: Data Set and Network for Real-Time Semantic Mapping in Dynamic Environments

FFNet: Frequency Fusion Network for Semantic Scene Completion