Accurate traffic prediction plays a crucial role in improving traffic conditions and optimizing road utilization. Effectively capturing the multi-scale temporal dependencies and dynamic spatial dependencies is crucial for accurate traffic prediction. These features can effectively reflect complex dynamic spatial–temporal processes, which have not been comprehensively addressed in most existing research work. Motivated by this issue, the primary contribution of this paper lies in proposing a novel Spatial–Temporal Fusion Graph Neural Network (STFGCN) for accurate traffic prediction, achieved by extracting multi-scale temporal dependencies from multiple semantic environments and constructing a dynamic adaptive graph to model spatial dependencies based on temporal characteristics. Specifically, to capture the multi-scale dynamic temporal dependencies effectively, a Multi-Scale Fusion Convolution (MSFC) module is designed, in which the temporal dependencies are extracted from multiple textual environments by utilizing multi-scale convolution. In order to model dynamic spatial dependencies, a Spatial Adaptive Fusion Convolution (SAFC) module is designed by combining the recent coherence and periodicity to infer dynamic graphs, which are then fused to model dynamic spatial dependencies. Extensive experimental results on five real-world datasets demonstrate that the proposed STFGCN has superior performance. Specifically, compared with the state-of-the-art baselines, STFGCN reduced 1.2% to 16.4% in RMSE measure.