Anomaly detection using multivariate time series plays a crucial role in system security. Conventional deep learning detection techniques mainly depend on temporal dependency and employ reconstruction or prediction-based methods. However, as feature variables grow more intricate, there is a risk of neglecting essential spatio-temporal structural information, potentially leading to insufficient model training in unsupervised settings. Hence, we propose an end-to-end anomaly detection model with multiple pre-training tasks designed for the spatio-temporal dimension to enhance our constraints. Specifically, in the temporal dimension, we employ an autoregressive task to train timestamp associations using data’s concealed autocorrelation and periodicity. In the spatio dimension, we acquire knowledge of a diverse feature-related heterogeneous graph. Subsequently, we design three different graph contrastive learning tasks to tap into the effective information arising from the inherent heterogeneity and hierarchy in spatio structures. Through joint spatio-temporal modeling, we can effectively capture inter and intra-feature associations from series and graph structural features, enhancing model robustness to cope with the complex chain reactions between features. Finally, we assess our model on three real-world datasets: SWaT, WADI(2017, 2019), our F1 scores demonstrate enhancements of 6.17%, 18.3% and 5.35% over the top-tier baseline performance. Our model is applicable for both temporal and graph, is self-supervised learning for sparse data which is suitable for data sparsity and complex scenarios that need to capture spatio-temporal characteristics at the same time, for example, traffic flow detection and anomaly detection of intelligent systems. Further visualization experiments and case studies will provide a better interpretation of our model.