Traffic simulation is a critical tool for congestion analysis, travel time estimation, and route optimization in urban planning, benefiting navigation apps, transportation network companies, and state agencies. Traditionally, traffic micro-simulation frameworks are based on road segments and can only support a limited number of main roads. Efficient traffic simulation on a regional scale remains a significant challenge due to the complexity of urban mobility and the large scale of spatiotemporal data. This paper introduces a Large Scale Multi-GPU Parallel Computing based Regional Scale Traffic Simulation Framework (LPSim), which leverages graphical processing unit (GPU) parallel computing to address these challenges. LPSim utilizes a multi-GPU architecture to simulate extensive and dynamic traffic networks with high fidelity and reduced computation time. Using the parallel processing capabilities of GPUs, LPSim can perform tens of millions of individual vehicle dynamics simulations simultaneously, significantly outperforming traditional CPU-based approaches. The framework is designed to be scalable and can easily accommodate the increasing complexity of traffic simulations. We present the theory behind GPU-based traffic simulation, the architecture of single- and multi-GPU based simulations, and the graph partition strategies that enhance computation resource load balance. Our experimental results demonstrate the effectiveness of LPSim in simulating large-scale traffic scenarios. LPSim is capable of completing simulations of 2.82 million trips in just 6.28 min on a single GPU machine equipped with 5120 CUDA cores (Tesla V100-SXM2). Furthermore, utilizing a Google Cloud instance with two NVIDIA V100 GPUs, which collectively offer 10240 CUDA cores, LPSim successfully simulates 9.01 million trips within 21.16 min. We further tested our simulator with the same demand on dual NVIDIA A100-PCIE-40GB GPUs, which finished the simulation in 0.0398 h, approximately 113 times faster than the same simulation scenario running on an Intel(R) Xeon(R) Gold 6326 CPU @ 2.90 GHz, which takes 4.49 h to complete. This performance not only demonstrates its speed and scalability advantages over traditional simulation techniques but also highlights LPSim’s unique position as the first traffic simulation framework that is scalable for both single- and multiple-GPU configurations. Consequently, LPSim provides an invaluable tool for individuals and extensive research teams alike, enabling the acquisition of large-scale traffic simulation results in a time-efficient manner. LPSim code is available at: https://github.com/Xuan-1998/LPSim