Abstract
Although CNN-based optical flow methods have achieved remarkable performance in terms of computational accuracy and efficiency, the issue of edge-blurring caused by large displacements remains an open challenge. To address this problem, we propose a local criss-cross attention based optical flow estimation method using multi-scale image features and feature pyramid. First, we design an image pyramid-based feature extraction sub-network and then incorporate it into the feature pyramid network to construct a hybrid feature extraction module, which is able to extract multi-scale structural and semantic information from the input images. Second, we concatenate a local criss-cross attention module with the hybrid feature extraction module to build a global feature encoder. The global feature encoder further captures the long-range dependencies within the feature map to improve the large displacement estimation performance. Finally, we combine the global feature encoder with an iterative optical flow decoder, and thus propose a novel network named LCIF-Net. We demonstrate its significant performance benefits on MPI-Sintel and KITTI datasets. Compared with the existing optical flow estimation methods, our LCIF-Net remarkably improves the accuracy and robustness for the optical flow estimation, especially in the regions with large displacements and motion edges.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have