With the development of society and the rise of people's environmental awareness, air pollution is receiving increased public attention. Accurate air quality prediction can provide useful information for government decision-making and residents' activities. However, accurately predicting future air quality remains a challenging task because of the complex spatial-temporal dependencies of air quality. Previous studies failed to explicitly model these spatial-temporal dependencies. In this paper, we propose a self-adaptive spatial-temporal network (SA-STNet) to efficiently and effectively capture the spatial-temporal dependencies of air quality. In order to effectively aggregate spatial information, we employ a self-adaptive graph convolution module that can learn the latent spatial correlations of air quality automatically. In the temporal dimension, we utilise three independent components to capture the recent, daily-periodic, and weekly-periodic temporal dependencies of air quality, respectively. In addition, our model exploits rich external complementary information by means of a features extraction component. A parametric-matrix-based fusion architecture is used to combine the outputs of different components into a joint representation which is used for generating the final prediction results. Extensive experiments carried out on real-world datasets demonstrate the outstanding performance of our model compared with baselines and state-of-the-art methods.