Deep Optical Flow Learning With Deformable Large-Kernel Cross-Attention

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Optical flow estimation from image sequences is a fundamental problem in computer vision. In recent years, some methods have utilized Transformer to model global dependencies and improve optical flow, achieving impressive performance. However, in these methods, Transformers typically treat two-dimensional image features as one-dimensional sequences. While position encoding partially mitigates the loss of position information between different feature patches, Transformer still lacks inherent biases for modeling local visual patterns and tend to overlook channel characteristics in image features. Therefore, this paper introduces a deformable large kernel attention module, combining the strengths of convolution and attention mechanisms, which can preserve feature channel adaptability while modeling global dependencies without compromising the two-dimensional structure of features, significantly enhancing optical flow estimation. Additionally, the introduced deformable mechanism allows the model to adapt appropriately to different data patterns. Experimental results demonstrate that our optical flow estimation method achieves competitive results on publicly available benchmarks such as Sintel and KITTI.

Save Icon
Up Arrow
Open/Close