In multi-modal magnetic resonance imaging (MRI), the tasks of imputing or reconstructing the target modality share a common obstacle: the accurate modeling of fine-grained inter-modal differences, which has been sparingly addressed in current literature. These differences stem from two sources: 1) spatial misalignment remaining after coarse registration and 2) structural distinction arising from modality-specific signal manifestations. This paper integrates the previously separate research trajectories of cross-modality synthesis (CMS) and multi-contrast super-resolution (MCSR) to address this pervasive challenge within a unified framework. Connected through generalized down-sampling ratios, this unification not only emphasizes their common goal in reducing structural differences, but also identifies the key task distinguishing MCSR from CMS: modeling the structural distinctions using the limited information from the misaligned target input. Specifically, we propose a composite network architecture with several key components: a label correction module to align the coordinates of multi-modal training pairs, a CMS module serving as the base model, an SR branch to handle target inputs, and a difference projection discriminator for structural distinction-centered adversarial training. When training the SR branch as the generator, the adversarial learning is enhanced with distinction-aware incremental modulation to ensure better-controlled generation. Moreover, the SR branch integrates deformable convolutions to address cross-modal spatial misalignment at the feature level. Experiments conducted on three public datasets demonstrate that our approach effectively balances structural accuracy and realism, exhibiting overall superiority in comprehensive evaluations for both tasks over current state-of-the-art approaches. The code is available at https://github.com/papshare/FGDL.
Read full abstract