This paper demonstrates a novel approach for segmenting complex vessel-like structures from images of retinal vessels, surface cracks, and roadmaps, a challenging task due to nuisance variations in width, curvature, and branching patterns, as well as cluttered backgrounds caused by adverse imaging conditions. We introduce the Spectral Transformer (SpecFormer), a Transformer built from the frequency domain to segment the elongated and linear structured content of images. The idea behind SpecFormer is to take full advantage of the ability of low-frequency components in the Fourier domain to represent the overall structure, global patterns, and smooth variations. Specifically, a Sparse Spectral Neural Operator (SSNO) is proposed to modulate the sparse frequency-concentrated spectrum via learnt frequency-specific filtering, which can well represent the vessel-like structure in the Fourier domain. This operator, as the core component of Dual Attention Block (DAB), is designed in a dual-path way, i.e., self- and scaling-attention paths, to simultaneously capture the long-range dependencies and contextual information of the feature. The complete form of the SpecFormer is built with multiple DABs and modules for patch manipulations and feature fusion. We evaluated the SpecFormer on a wide range of publicly available datasets and achieved consistent improvements over the state-of-the-art (SOTA) methods. Code is available at https://github.com/LouisNUST/Spectral_Transformer.
Read full abstract