Multi-view stereo with recurrent neural networks for spatio-temporal consistent depth maps

Hosung Son,Suk-Ju Kang

doi:10.1109/iceic57457.2023.10049937

Abstract

Depth estimation methods based on deep learning have been studied to improve depth estimation accuracy. However, obtaining inter-frame consistency in depth maps in video depth estimation remains a challenge. Therefore, we proposed an application methodology for spatio-temporal consistency enhancement in video depth estimation based on convolutional neural networks (CNNs) and recurrent neural networks (RNNs). In other words, the convolutional long-short term memory (ConvLSTM) module was added to the decoder of depth estimation network to enable the use of the information from the previous frames. Additionally, the one-stage learning process was implemented to ensure ease of training. In conclusion, we experimentally show that the proposed method can achieve not only improved accuracy also consistency between depth map frames.

Full Text