Abstract

Although CNN-based optical flow methods have achieved remarkable performance in terms of computational accuracy and efficiency, the issue of edge-blurring caused by large displacements remains an open challenge. To address this problem, we propose a local criss-cross attention based optical flow estimation method using multi-scale image features and feature pyramid. First, we design an image pyramid-based feature extraction sub-network and then incorporate it into the feature pyramid network to construct a hybrid feature extraction module, which is able to extract multi-scale structural and semantic information from the input images. Second, we concatenate a local criss-cross attention module with the hybrid feature extraction module to build a global feature encoder. The global feature encoder further captures the long-range dependencies within the feature map to improve the large displacement estimation performance. Finally, we combine the global feature encoder with an iterative optical flow decoder, and thus propose a novel network named LCIF-Net. We demonstrate its significant performance benefits on MPI-Sintel and KITTI datasets. Compared with the existing optical flow estimation methods, our LCIF-Net remarkably improves the accuracy and robustness for the optical flow estimation, especially in the regions with large displacements and motion edges.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.