Autonomous-rail Rapid Transit (ART) possesses various advantages in intelligent transportation, but it does not effectively recognize road conditions caused solely by deploying single-modal cameras. In this paper, a lightweight fusion-based object detection neural network was designed with multi-modal sensors for the ART. Firstly, the Light Detection and Ranging (LiDAR) applied additional encoding and preprocessing to the point cloud. Secondly, a backbone and a detection head of the network structure were proposed through re-parameterization and pruning techniques. Furthermore, a fusion module was designed with a selective soft attention mechanism to fuse the extracted features. The proposed model was tested on the open autonomous driving dataset; it achieved a 7.38% improvement in mean average precision (mAP) compared to the original you only look once (YOLO) as well as other state-of-the-art (SOTA) models. Finally, practical experiments were conducted in the maintenance center of ART to simulate the operational scenarios and validate the feasibility of the proposed method in this study. By fully utilizing the information in different modalities and addressing the limitations of single-modal recognition, efforts were made to improve the robustness of road object detection for ART under different road conditions. Consequently, our method provides effective solutions which benefit intelligent transportation with advanced algorithms and strategies.
Read full abstract