Timely and accurate traffic flow prediction is crucial for stabilizing road conditions, reducing environmental pollution, and mitigating economic losses. While current graph convolution methods have achieved certain results, they do not fully leverage the true advantages of graph convolution. There is still room for improvement in simultaneously addressing multi-graph convolution, optimizing graphs, and simulating road conditions. Based on this, this paper proposes MSA-GCN: Multistage Spatio-Temporal Aggregation Graph Convolutional Networks for Traffic Flow Prediction. This method overcomes the aforementioned issues by dividing the process into different stages and achieves promising prediction results. In the first stage, we construct a latent similarity adjacency matrix and address the randomness interference features in similarity features through two optimizations using the proposed ConvGRU Attention Layer (CGAL module) and the Causal Similarity Capture Module (CSC module), which includes Granger causality tests. In the second stage, we mine the potential correlation between roads using the Correlation Completion Module (CC module) to create a global correlation adjacency matrix as a complement for potential correlations. In the third stage, we utilize the proposed Auto-LRU autoencoder to pre-train various weather features, encoding them into the model’s prediction process to enhance its ability to simulate the real world and improve interpretability. Finally, in the fourth stage, we fuse these features and use a Bidirectional Gated Recurrent Unit (BiGRU) to model time dependencies, outputting the prediction results through a linear layer. Our model demonstrates a performance improvement of 29.33%, 27.03%, and 23.07% on three real-world datasets (PEMSD8, LOSLOOP, and SZAREA) compared to advanced baseline methods, and various ablation experiments validate the effectiveness of each stage and module.