Abstract

The neural radiance fields (NeRF) for realistic novel view synthesis require camera poses to be pre-acquired by a structure-from-motion (SfM) approach. This two-stage strategy is not convenient to use and degrades the performance because the error in the pose extraction can propagate to the view synthesis. We integrate pose extraction and view synthesis into a jointly optimized process so that they can benefit from each other. For network training, only images are given without pre-known camera poses. The camera poses are obtained by the depth-consistent constraint in which the identical feature in different views has the same world coordinates transformed from the local camera coordinates according to the extracted poses. The depth-consistent constraint is jointly optimized with the pixel color constraint. The poses are represented by a CNN-based deep network, whose input is the related frames. This joint optimization enables NeRF to be aware of the scene’s structure, resulting in improved generalization performance. Experiments on three datasets demonstrate the effectiveness of camera pose estimation and novel view synthesis. Code is available at https://github.com/XTU-PR-LAB/SaNerf.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call