The existing medical image registration algorithms have the problem of low registration accuracy when processing large deformation medical images. In order to improve registration performance and utilize the global context extraction ability of Transformers without causing high computational complexity, a UNet-like Transformer model combining CNN and Transformer was constructed for 3D medical image registration tasks. We use the Efficient Global Local Attention (EGLA) mechanism to construct a Transformer encoder to further address the difficulty of modeling long-distance dependencies in existing medical image registration networks. We leverage the local modeling capabilities of CNN and the long-distance information capture capabilities of Transformer to achieve high-precision registration. The algorithm has undergone detailed validation experiments on two public datasets. The qualitative and quantitative registration results validate the effectiveness of the proposed model.
Read full abstract