Abstract
Depth maps have been used in many vision tasks due to the real-time acquisition and low cost of consumer depth cameras. However, they still suffer from low precision and severe sensor noise, even with the significant research in depth enhancement. We propose a novel multi-level feature fusion convolutional neural network (CNN) for facial depth map refinement named MFFNet. It is a multi-stage network, where each stage is a local multi-level feature fusion (LMLF) block. For smoothing the noise as well as boosting detailed facial structure, a hierarchical fusion strategy is adopted to fully fuse multi-level features, i.e., an LMLF block fuses multi-level features locally in each stage, while inter-stage skip connections are employed to reach a global multi-level feature fusion. Moreover, the inter-stage skip connections can also ease the training through shortening the information propagation paths. We introduce an effective data augmentation method to synthesize noisy facial depth maps of various poses. Training with these synthetic data improves the robustness of the proposed method to face poses. The proposed method is evaluated with a synthetic facial depth map dataset, a real Kinect V2 facial depth map dataset and the Middlebury Stereo Dataset. Experimental results show that our method produces refined depth maps with high quality and outperforms several state-of-the-art methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Similar Papers
More From: Signal Processing: Image Communication
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.