Short-term origin–destination (OD) passenger flow forecasting is crucial for urban rail transit enterprises aiming to optimise transportation products and increase operating income. As there are large-scale OD pairs in an urban rail transit system, OD passenger flow cannot be obtained in real time (temporal hysteresis). Additionally, the distribution characteristics are also complex. Previous studies mainly focus on passenger flow prediction at metro stations, while few methods solve the OD passenger flow prediction problems of an urban rail transit system. In view of this, we propose a novel deep learning method fusing high-dimensional features (HDF-DL) with multi-source data. The HDF-DL method is combined with three modules. The temporal module incorporates the time-varying, trend, and cyclic characteristics of OD passenger flow, while the latest OD passenger flow time sequence (within 1 h) is excluded from the time-varying characteristics. In the spatial module, the K-means and K-shape algorithms are used to classify OD pairs from multiple perspectives and capture the spatial features, reducing the difficulty of OD passenger flow predictions with large-scale and complex characteristics. Weather factors are considered in the external feature module. The HDF-DL method is tested on a large-scale metro system in China, in which eight baseline models are designed. The results show that the HDF-DL method achieves high prediction accuracy across multiple time granularities, with a mean absolute percentage error of about 10%. OD passenger flow in every departure time interval can be predicted with high and stable accuracy, effectively capturing temporal characteristics. The modular design of HDF-DL, which fuses high-dimensional features and employs appropriate neural networks for different data types, significantly reduces prediction errors and outperforms baseline models.