Novel view synthesis aims to generate new perspectives from a limited number of input views. Neural Radiance Field (NeRF) is a key method for this task, and it produces high-fidelity images from a comprehensive set of inputs. However, a NeRF’s performance drops significantly with sparse views. To mitigate this, depth information can be used to guide training, with coarse depth maps often readily available in practical settings. We propose an improved sparse view NeRF model, ATGANNeRF, which integrates an enhanced U-Net generator with a dual-discriminator framework, CBAM, and Multi-Head Self-Attention mechanisms. The symmetric design enhances the model’s ability to capture and preserve spatial relationships, ensuring a more consistent generation of novel views. Additionally, local depth ranking is employed to ensure depth consistency with coarse maps, and spatial continuity constraints are introduced to synthesize novel views from sparse samples. SSIM loss is also added to preserve local structural details like edges and textures. Evaluation on LLFF, DTU, and our own datasets shows that ATGANNeRF significantly outperforms state-of-the-art methods, both quantitatively and qualitatively.
Read full abstract