Abstract

We study a challenging task, creating the instructional 3D object based on a single image and human instruction. Although existing methods utilize pre-trained text-to-image diffusion models to optimize Neural Radiance Fields (NeRF) and achieve remarkable results, they are inherently challenged in addressing this task due to two inherent obstacles. First, the written instruction is mismatched or uncorrelated with target images in the distribution of pre-trained text-to-image diffusion models, which leads to insufficient supervision. Another is editing 3D content starting from a single image, which is an ill-posed problem as the image only provides partial information. To tackle the difficulties, we propose an Instruct Pixel-to-3D framework, which exploits the supervisory signals from the given instructions and integrates the domain-specific knowledge drawn from 2D and 3D diffusion models. Specifically, to address the first challenge, we propose instruction guidance to facilitate meticulous image editing using fine-grained instructions and regulate the novel views of 3D objects using positional prompts. To tackle the second difficulty, we propose diffusion guidance for 3D generation, which optimizes the NeRF employing dual guidances: a 2D diffusion guidance derived from the score distillation sampling (SDS) loss and a 3D diffusion guidance embodied by the Zero-1-to-3 loss. Moreover, Instruct Pix-to-3D adopts a coarse-to-fine training strategy aimed at mitigating the complexities inherent in reconstructing scenes from sparse viewpoints. Qualitative and quantitative experiments on the DTU dataset and in-the-wild images demonstrate that our approach could synthesize high-quality and high-fidelity 3D content based on a single image and human instruction.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.