Abstract

Semantic segmentation requires the simultaneous generation of strong semantic and precise localization segmentation results. However, their inherent paradox drives most existing methods to perform trade-offs or overcompensation between high-level semantics and fine localization during resolution reconstruction, which may lead to limited performance or enormous computation costs. To this end, inspired by the frequency model of natural images, we propose a new encoder–decoder-based segmentation architecture, namely MPLSeg, from a novel perspective: semantic-localization decoupled representation via magnitude-aware and phase-sensitive learning. Specifically, we first investigate and reveal the symmetric inverse inherent properties of image magnitude and phase in semantics and localization. Then, building upon that, we construct a concise adaptive frequency-aware module (AFM) to alleviate the semantic gap and spatial location misalignment during multi-level feature fusion. The core of AFM comprises a magnitude perceptron (MP) equipped with the dynamic magnitude weighting mechanism and a phase amender (PA) designed with a spectral residual mapping for keeping sensitive to salient frequency combinations and off-norm localization features, respectively. Finally, we tailor a phase-sensitive loss (PSL) as an auxiliary supervision for semantic-independent proto-localization learning. The PSL ensures multi-level feature diversity and enhances fine-grained resolution reconstruction. Extensive experimental results demonstrate the effectiveness and superiority of MPLSeg and its components. Without any fancy tricks, MPLSeg exhibits the state-of-the-art performance on three challenging semantic segmentation benchmarks. The code is available at https://github.com/qyan0131/MPLSeg.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call