Abstract

In this work, we propose a self-supervised scene flow framework for joint learning of optical flow, stereo depth, camera pose, and rigidity map and handle the occlusion during training. Specifically, we propose a feature masking method to alleviate the occlusion impact on the correlation result and reduce the outliers in both optical flow and depth map. We use the improved optical flow and depth to estimate the camera motion directly using the Perspective-n-Point method, which improves it accordingly. Furthermore, we recursively update the optical flow in both occluded and non-occluded regions with self-supervised cues learned from the rigid and optical flows. Reducing the error in the occluded regions enhances the rigidity map and improves the final optical flow accordingly. Our model achieved the state-of-the-art performance on KITTI 2015 benchmark for optical flow and produced competitive results for the depth, pose, and segmentation tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call