Traffic forecasting stands as a cornerstone in urban planning, yet existing methods mainly fall short in capturing long-term spatio-temporal patterns. To be specific, various exquisite modules are designed in current works, while bringing heavy computing costs with the length of historical data. Furthermore, any replacement or reintegration of modules could weaken final performance, and the whole model needs to be trained from scratch on different datasets. To address the above shortcomings, inspired by pre-training methodologies, we introduce a novel Pre-trained Spatio-Temporal Network (PreSTNet), a two-phase framework comprising pre-training and fine-tuning. In the first pre-training phase, PreSTNet undertakes a data masking and recovery task, which is facilitated through a dedicated long sequence embedding module and an encoder–decoder structure with temporal attention layers and graph convolution operators. The recovery task involves masking several data, forcing the model to capture correlations between unmasked data and finally recover the masked part, which facilitates the extraction of long-term spatio-temporal features from extensive data. The second phase involves freezing the learned parameters and replacing the decoder with a forecasting header. This header, designed as a meta-learning fusion module and spatio-temporal convolution layers, can integrate long-term and short-term traffic data, and is trained in the manner of supervised learning to suit the target task. Rigorous experiments on real-world datasets underscore the superiority of PreSTNet. Further case studies reveal that the pre-trained encoder, coupled with an intuitive linear regression, achieves comparable performance to advanced methods. These findings affirm the robust capabilities of the proposed PreSTNet in addressing the complexities of traffic forecasting.