Multi-modal medical image super-resolution (SR) plays a vital role in enhancing the resolution of medical images, providing more detailed visuals that aid in accurate clinical diagnosis. Recently, Transformer-based super-resolution methods have significantly promoted performance improvement in this field due to their capacity to capture global dependencies. They usually process all non-overlapping patches as tokens and densely sample these tokens without screening for calculating attention mechanisms. However, this strategy ignores the spatial sparsity of medical images, resulting in redundant or even detrimental computations on less informative regions. Hence, this paper proposes a novel sparsity-guided medical image SR network, namely SG-SRNet, by exploiting the spatial sparsity characteristics of the medical images. SG-SRNet mainly consists of two components: a sparsity mask (SM) generator for image sparsity estimation, and a sparsity-guided Transformer (SGTrans) for high-resolution image reconstruction. Specially, the SM generator generates a sparsity mask by minimizing our cross-sparsity loss which can respond to the informative positions. SGTrans first screens out the informative patches according to the sparsity mask. Then, SGTrans utilizes the designed cluster-based attention to only calculate attention between information-related tokens. We perform comprehensive experiments on three datasets to show that SG-SRNet brings significant performance enhancements with low computational complexity.