Vehicle detection algorithms are essential for intelligent traffic management and autonomous driving systems. Current vehicle detection algorithms largely rely on deep learning techniques, enabling the automatic extraction of vehicle image features through convolutional neural networks (CNNs). However, in real traffic scenarios, relying only on a single feature extraction unit makes it difficult to fully understand the vehicle information in the traffic scenario, thus affecting the vehicle detection effect. To address this issue, we propose a lightweight vehicle detection algorithm based on Mamba_ViT. First, we introduce a new feature extraction architecture (Mamba_ViT) that separates shallow and deep features and processes them independently to obtain a more complete contextual representation, ensuring comprehensive and accurate feature extraction. Additionally, a multi-scale feature fusion mechanism is employed to enhance the integration of shallow and deep features, leading to the development of a vehicle detection algorithm named Mamba_ViT_YOLO. The experimental results on the UA-DETRAC dataset show that our proposed algorithm improves mAP@50 by 3.2% compared to the latest YOLOv8 algorithm, while using only 60% of the model parameters.
Read full abstract