Due to changes in urban residents’ consumption habits and lifestyles, accurately predicting natural gas consumption has become increasingly important. To address this issue, this paper proposes a forecasting model that combines Ensemble Learning (EL), Variational Mode Decomposition (VMD), Transformer, and LSTM. First, XGBoost, CatBoost, and LightGBM are used as base learners in the ensemble learning framework, with the predictions generated by the ensemble model integrated into the original dataset. Next, the VMD method is employed to decompose the natural gas load sequence into several intrinsic mode functions (IMFs), effectively extracting the inherent features of the natural gas load sequence. Finally, the data is input into the Transformer-ResLSTM network for prediction. This network replaces the original Transformer decoder structure with an LSTM network and fully connected layers, creating a new decoder structure. Additionally, a residual connection mechanism is introduced in both the encoder of the Transformer network and the new decoder structure. Experimental results show that, compared to traditional models such as ARIMA, Transformer, GRU, and LSTM, the proposed hybrid model significantly improves prediction accuracy, reducing MSE by 92–98% and MAE by 74–83%. In summary, this method demonstrates significant potential and practical value in enhancing the accuracy of natural gas load forecasting.