Unsupervised deep homography with multi‐scale global attention

Wei Hu,Mingyuan Lin,Chu He,Haoyu Zhou

doi:10.1049/ipr2.12842

Abstract

AbstractHomography estimation serves an important role in many computer vision tasks. Depending heavily on hand‐craft feature quality, traditional methods degenerate sharply in scenes with low texture. Existing deep homography methods can handle the low‐texture problem but are not robust for scenes with low overlap rates and/or illumination changes. This paper proposes a novel unsupervised homography estimation method that can simultaneously handle such low overlap and illumination change. Specifically, a powerful module, named global transformer contextual encoder (GTCE) is first designed, together with a correlation encoder to effectively aggregate global contextual information and reduce matching ambiguity between feature maps. Moreover, a hybrid photo‐perceptual loss for unsupervised homography estimation is proposed. The proposed loss function considers alignment information on both pixel level and perceptual level thus helping this network to be more adaptive to various scenes, including normal cases and illumination change cases. The results of extensive experiments on synthetic and real‐world datasets demonstrate the superiority of this proposed method over current state‐of‐the‐art solutions especially on challenging scenes with low overlap rates, repetitive patterns and illumination changes.

Full Text