Passenger flow prediction is a critical approach to ensure the effective functioning of urban rail transit. However, there are few studies that combine multiple influencing factors for short-term passenger flow prediction. It is also a challenge to accurately predict passenger flow at all stations in the line at the same time. To overcome the above limitations, a deep learning-based method named ST-RANet is proposed, which consists of three spatio-temporal modules and one external module. The model is capable of predicting inbound and outbound passenger flow for all stations within the network simultaneously. We model the spatio-temporal data in terms of three temporal characteristics, including closeness, period, and trend. For each characteristic, we construct a spatio-temporal module that innovatively integrates the attention mechanisms into the middle of residual units and convolutional neural networks (CNNs) to extract and learn spatio-temporal features. Subsequently, the results of the three modules are integrated using a parameter matrix method, which allows for dynamic aggregation based on data. The integration results are further combined with external factors, such as holidays and meteorological information, to obtain passenger flow prediction values for each station. The proposed model is validated using real data from Beijing Subway, and optimized parameters are applied for 30-min granularity passenger flow predictions. Comparing the performance against 5 baseline models and verifying with data from multiple lines, the results indicate that the proposed ST-RANet model shows the best results. It is demonstrated that the method proposed in this paper has high prediction accuracy and good applicability.