Abstract

Optical flow estimation searches for correspondence between two images. In the unsupervised approach, most networks use the feature correlation volume to track the flow, and unsupervised training is achieved through a photometric loss function. However, various complex situations in the natural environment, such as object occlusion, motion blur, the camera being out-of-focus, limited perspective, and variation in lighting conditions, make it challenging to find correspondence accurately, thus complicating unsupervised optical flow estimation. This study decouples the problem into two sub-tasks: one is to search for determined correspondence within a pair of frames, and the other is to cope with mismatched regions due to occlusion, blur, light variation, etc., by introducing more spatial and temporal context information. We propose a multi-frame temporal dynamic model that recursively infers optical flow over causal sequences of arbitrary-length. Our innovative approach introduces information entropy and forward–backward consistency checks to measure the confidence regarding the matching of image pairs. To compensate for low-confidence regions, the proposed network adaptively identifies regions with correspondence confidence and utilizes temporal and spatial smoothness assumptions for motion re-prediction. Paired with well-designed simulation of dynamic occlusion pseudo-labels and scene variation, our model can learn a variety of complex scenes in a multi-frame environment to optimize low-confidence regions efficiently. Experimental results demonstrate that the proposed model is able to run at high speed in real-time tasks while maintaining high accuracy, thus achieving state-of-the-art results on Sintel Clean and Final benchmarks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call