The background noise contained in seismic records contaminate the effective reflection waves and impact the subsequent processes, such as inversion and migration. The properties of seismic noises, such as non-Gaussianity and non-linearity, will be even more complex in challenging exploration environments. Deep-learning techniques are effective in suppressing complex seismic noises and outperform conventional denoising algorithms. Nonetheless, most deep learning networks are designed to extract the features of input data in single-scale only, which leads to inadequate performance when dealing with complicated seismic data. To enhance the denoising capability for seismic noises of deep learning, a novel mutual-guided scale-aggregation denoising network (MSD-Net) is designed to suppress seismic noises by utilizing the multi-scale features of input data. Specifically, the MSD-Net achieves functions including multi-scale feature extraction, fusion, and guidance through information interaction between different scales. Spatial aggregation attention is used in MSD-Net to enhance relevant features, which improves the separation of effective reflection waves and noises further. Additionally, a model-based training data generation strategy is devised to ensure the efficiency of learning and the denoising capability of MSD-Net. Compared to conventional denoising algorithms and typical deep learning networks, MSD-Net shows powerful result in suppressing complex seismic noises and generalization.