Abstract

We take formulate structure from motion as a learning problem, and propose an end-to-end learning framework to calculate the image depth, optical flow, and the camera motion. This framework is composed of multiple encoder-decoder networks. The key part of the network structure is the FlowNet, which can improve the accuracy of the estimated camera ego-motion and depth. As with recent studies, we use an end-to-end learning approach with multi-view synthesis as a variety of supervision, and proposes multi-view consistency losses to constrain both depth and camera ego-motion, requiring only monocular video sequences for training. Compared to the recently popular depth-estimation-networks using a single image, our network learns to use motion parallax correction depth. Although MuDeepNet training requires the use of two adjacent frames to obtain motion parallax, it is tested by using a single image. Thus, MuDeepNet is a monocular system. The experiments on KITTI dataset show our MuDeepNet outperforms other methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call