Intelligent transportation systems are pivotal in modern urban development, aiming to enhance traffic management efficiency, safety, and sustainability. However, existing 3D Visual Scene Understanding methods often face challenges of robustness and high computational complexity in complex traffic environments. This paper proposes a Multi-Sensor Signal Fusion method based on PV-RCNN and LapDepth (PV-LaP) to improve 3D Visual Scene Understanding. By integrating camera and LiDAR data, the PV-LaP method enhances environmental perception accuracy. Evaluated on the KITTI and WHU-TLS datasets, the PV-LaP framework demonstrated superior performance. On the KITTI dataset, our method achieved an Absolute Relative Error (Abs Rel) of 0.079 and a Root Mean Squared Error (RMSE) of 3.014, outperforming state-of-the-art methods. On the WHU-TLS dataset, the method improved 3D reconstruction precision with a PSNR of 19.15 and an LPIPS of 0.299. Despite its high computational demands, PV-LaP offers significant improvements in accuracy and robustness, providing valuable insights for the future development of intelligent transportation systems.