Depth-image-based rendering (DIBR) technologies have been widely employed to synthesize novel realistic views from a single image in 3D video applications. However, DIBR-oriented approaches heavily rely on the accuracy of depth maps, usually requiring the depth GT as a prior. Despite that, there might exist extensive float precision losses and invalid holes in the synthesized view due to warping error and occlusion. In this paper, we propose an end-to-end as-deformable-as-possible (ADAP) single-image-based view synthesis solution without depth prior. It addresses the above issues through two stages: alignment and reconstruction, where we first transform the input image to the latent feature space and then reconstruct the novel view in the image domain. In the first stage, the input image is deformed to align with the synthesized view at feature level. To this end, we propose an ADAP alignment mechanism through pixel-level warping to error-level quantization to feature-level alignment, progressively improving the deformable capability in handling challenging motion conditions in real-world scenes. In the second stage, we exploit an occlusion-aware reconstruction module to recover the content details from the deformed feature at pixel level. Extensive experiments demonstrate that our alignment-reconstruction approach is robust to the depth map. Even with a coarsely estimated depth map, our solution outperforms other SoTA schemes in the popular benchmarks.
Read full abstract