Large-scale urban water distribution network simulation plays a critical role in the construction, monitoring, and maintenance of urban water distribution systems. However, during the simulation process, matrix inversion calculations generate a large amount of computational data and consume significant amounts of time, posing challenges for practical applications. To address this issue, this paper proposes a parallel gradient calculation algorithm based on GPU hardware and the CUDA Toolkit library and compares it with the EPANET model and a model based on CPU hardware and the Armadillo library. The results show that the GPU-based model not only achieves a precision level very close to the EPANET model, reaching 99% accuracy, but also significantly outperforms the CPU-based model. Furthermore, during the simulation, the GPU architecture is able to efficiently handle large-scale data and achieve faster convergence, significantly reducing the overall simulation time. Particularly in handling larger-scale water distribution networks, the GPU architecture can improve computational efficiency by up to 13 times. Further analysis reveals that different GPU models exhibit significant differences in computational efficiency, with memory capacity being a key factor affecting performance. GPU devices with larger memory capacity demonstrate higher computational efficiency when processing large-scale water distribution networks. This study demonstrates the advantages of GPU acceleration technology in the simulation of large-scale urban water distribution networks and provides important theoretical and technical support for practical applications in this field. By carefully selecting and configuring GPU devices, the computational efficiency of large-scale water distribution networks can be significantly improved, providing more efficient solutions for future urban water resource management and planning.