Abstract

3D Stylization that creates stylized multi-view images is quite challenging, as it requires not only generating images which align with the desired style but also maintaining consistency across different perspectives. Most previous image style transfer methods focus on the 2D image domain and stylize each view independently, suffering from multi-view inconsistency. To tackle this challenging problem, we build on the neural radiance fields (NeRF) to stylize each 3D scene, as NeRF inherently ensures consistency across multiple perspectives, and has two sub-networks of geometry and appearance where appearance stylization cannot change the geometry. To enable arbitrary style transfer and more explicit and precise style adjustment, we introduce the CLIP model, which allows for style transfer based on either a text prompt or an arbitrary style image. We employ an ensemble of loss functions, of which CLIP loss ensures the similarity between the shared latent embeddings and generated style images, and Mask Loss is to constrain the 3D geometry to avoid non-smooth surface of NeRF. Experimental results demonstrate the effectiveness of our arbitrary 3D stylization generalized across diverse datasets. The proposed method outperforms most image-based and text-based 3D stylization models in terms of style transfer quality, producing pleasing images.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.