Accurate PM2.5 concentration prediction is essential for environmental control management, therefore numerous air quality monitoring stations have been established, which generate multiple time series with spatio-temporal correlation. However, the statistical distribution of data from different monitoring stations varies widely, which needs to provide higher flexibility in the feature extraction stage. Moreover, the spatio-temporal correlation and mutation characteristics of the time series are difficult to capture. To this end, an adaptive spatio-temporal prediction network (ASTP-NET) is proposed, in which the encoder adaptively extracts the input data features, then captures the spatio-temporal dependencies and dynamic changes of the time series, the decoder part maps the encoded features into a predicted future time series representation, while an objective function is proposed to measure the overall fluctuations of the model’s multi-step prediction. In this paper, ASTP-NET is evaluated based on the Xi'an air quality dataset, and the results show that the model outperforms other baseline methods.