The North–South Seismic Belt is one of the major regions in China where strong earthquakes frequently occur. Predicting the monthly maximum magnitude is of significant importance for proactive seismic hazard defense. This paper uses seismic catalog data from the North–South Seismic Belt since 1970 to calculate and extract multiple seismic parameters. The monthly maximum magnitude is processed using Variational Mode Decomposition (VMD) with sample segmentation to avoid information leakage. The decomposed multiple modal data and seismic parameters together form a new dataset. Based on these datasets, this paper employs four deep learning models and four time windows to predict the monthly maximum magnitude, using prediction accuracy (PA), False Alarm Rate (FAR), and Missed Alarm Rate (MR) as evaluation metrics. It is found that a time window of 12 generally yields better prediction results, with the PA for Ms 5.0–6.0 earthquakes reaching 77.27% and for earthquakes above Ms 6.0 reaching 12.5%. Compared to data not decomposed using VMD, traditional error metrics show only a slight improvement, but the model can better predict short-term trends in magnitude changes.