Gumbel-NeRF: Representing Unseen Objects as Part-Compositional Neural Radiance Fields

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

We propose Gumbel-NeRF, a mixture-of-expert (MoE) neural radiance fields (NeRF) model with a hindsight expert selection mechanism for synthesizing novel views of unseen objects. Previous studies have shown that the MoE structure provides high-quality representations of a given large-scale scene consisting of many objects. However, we observe that such a MoE NeRF model often produces low-quality representations in the vicinity of experts’ boundaries when applied to the task of novel view synthesis of an unseen object from one/few-shot input. We find that this deterioration is primarily caused by the foresight expert selection mechanism, which may leave an unnatural discontinuity in the object shape near the experts’ boundaries. Gumbel-NeRF adopts a hindsight expert selection mechanism, which guarantees continuity in the density field even near the experts’ boundaries. Experiments using the SRN cars dataset demonstrate the superiority of Gumbel-NeRF over the baselines in terms of various image quality metrics. The code will be available upon acceptance.

Similar Papers
  • Conference Article
  • Cite Count Icon 66
  • 10.1109/cvpr46437.2021.01294
STaR: Self-supervised Tracking and Reconstruction of Rigid Objects in Motion with Neural Rendering
  • Jun 1, 2021
  • Wentao Yuan + 3 more

We present STaR, a novel method that performs Self-supervised Tracking and Reconstruction of dynamic scenes with rigid motion from multi-view RGB videos without any manual annotation. Recent work has shown that neural networks are surprisingly effective at the task of compressing many views of a scene into a learned function which maps from a viewing ray to an observed radiance value via volume rendering. Unfortunately, these methods lose all their predictive power once any object in the scene has moved. In this work, we explicitly model rigid motion of objects in the context of neural representations of radiance fields. We show that without any additional human specified supervision, we can reconstruct a dynamic scene with a single rigid object in motion by simultaneously decomposing it into its two constituent parts and encoding each with its own neural representation. We achieve this by jointly optimizing the parameters of two neural radiance fields and a set of rigid poses which align the two fields at each frame. On both synthetic and real world datasets, we demonstrate that our method can render photorealistic novel views, where novelty is measured on both spatial and temporal axes. Our factored representation furthermore enables animation of unseen object motion.

  • Conference Article
  • Cite Count Icon 32
  • 10.1109/icra46639.2022.9812272
CLA-NeRF: Category-Level Articulated Neural Radiance Field
  • May 23, 2022
  • Wei-Cheng Tseng + 3 more

We propose CLA-NeRF - a Category-Level Articulated Neural Radiance Field that can perform view synthesis, part segmentation, and articulated pose estimation. CLA-NeRF is trained at the object category level using no CAD models and no depth, but a set of RGB images with ground truth camera poses and part segments. During inference, it only takes a few RGB views (i.e., few-shot) of an unseen 3D object instance within the known category to infer the object part segmentation and the neural radiance field. Given an articulated pose as input, CLA-NeRF can perform articulation-aware volume rendering to generate the corresponding RGB image at any camera pose. Moreover, the articulated pose of an object can be estimated via inverse rendering. In our experiments, we evaluate the framework across five categories on both synthetic and real-world data. In all cases, our method shows realistic deformation results and accurate articulated pose estimation. We believe that both few-shot articulated object rendering and articulated pose estimation open doors for robots to perceive and interact with unseen articulated objects.

  • Research Article
  • 10.3390/biomimetics10060410
Bio-Inspired 3D Affordance Understanding from Single Image with Neural Radiance Field for Enhanced Embodied Intelligence.
  • Jun 19, 2025
  • Biomimetics (Basel, Switzerland)
  • Zirui Guo + 4 more

Affordance understanding means identifying possible operable parts of objects, which is crucial in achieving accurate robotic manipulation. Although homogeneous objects for grasping have various shapes, they always share a similar affordance distribution. Based on this fact, we propose AFF-NeRF to address the problem of affordance generation for homogeneous objects inspired by human cognitive processes. Our method employs deep residual networks to extract the shape and appearance features of various objects, enabling it to adapt to various homogeneous objects. These features are then integrated into our extended neural radiance fields, named AFF-NeRF, to generate 3D affordance models for unseen objects using a single image. Our experimental results demonstrate that our approach outperforms baseline methods in the affordance generation of unseen views on novel objects without additional training. Additionally, more stable grasps can be obtained by employing 3D affordance models generated by our method in the grasp generation algorithm.

  • Book Chapter
  • Cite Count Icon 14
  • 10.1007/978-3-031-26319-4_14
SymmNeRF: Learning to Explore Symmetry Prior for Single-View View Synthesis
  • Jan 1, 2023
  • Xingyi Li + 5 more

We study the problem of novel view synthesis of objects from a single image. Existing methods have demonstrated the potential in single-view view synthesis. However, they still fail to recover the fine appearance details, especially in self-occluded areas. This is because a single view only provides limited information. We observe that man-made objects usually exhibit symmetric appearances, which introduce additional prior knowledge. Motivated by this, we investigate the potential performance gains of explicitly embedding symmetry into the scene representation. In this paper, we propose SymmNeRF, a neural radiance field (NeRF) based framework that combines local and global conditioning under the introduction of symmetry priors. In particular, SymmNeRF takes the pixel-aligned image features and the corresponding symmetric features as extra inputs to the NeRF, whose parameters are generated by a hypernetwork. As the parameters are conditioned on the image-encoded latent codes, SymmNeRF is thus scene-independent and can generalize to new scenes. Experiments on synthetic and real-world datasets show that SymmNeRF synthesizes novel views with more details regardless of the pose transformation, and demonstrates good generalization when applied to unseen objects. Code is available at: https://github.com/xingyi-li/SymmNeRF.KeywordsNovel view synthesisNeRFSymmetryHyperNetwork

Save Icon
Up Arrow
Open/Close