Abstract
Phase unwrapping (PU) is the process of extracting the authentic phase image from its noisy wrapped measurements, playing a crucial role in scientific imaging techniques. PU requires solving a challenging non-linear ill-posed problem, particularly in the presence of noticeable noise. In recent years, deep learning has emerged as a promising approach for PU. Inspired by the success of convolutional neural networks (CNNs) in image restoration, many existing works trained CNNs for PU. However, due to the locality of convolutional kernels, CNNs are not efficient in capturing global spatial dependencies, a critical cue for PU. As an alternate, recent studies employed recurrent neural networks (RNNs) defined on handcrafted pixel paths. Nonetheless, a limited number of pre-defined pixel paths cannot fully exploit global spatial dependencies existing in complex phase structures. In this paper, we introduce a vision transformer (ViT) model that effectively captures both global and local spatial dependencies using a hierarchical structure with a multi-scale process. The proposed ViT model employs a series of global transformer blocks to capture global spatial dependencies at the roughest scale. The resulting global features are used to guide a set of local transformer blocks to analyze local spatial dependencies in a coarse-to-fine progressive manner for unwrapping. Extensive experiments show that, our proposed ViT model produces higher-quality unwrapped phases over existing CNN/RNN-based methods, while maintaining a lightweight nature.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have