Proactive hybrid learning framework for real-time multi-vehicle detection in unregulated traffic environments

M Ilamathi,Sabitha Ramakrishnan,Rakhul Kumar Babusankar

doi:10.1016/j.imavis.2024.105081

Abstract

Reliable multi-vehicle detection in unregulated traffic environments is a crucial computer vision task in the development of Intelligent Transportation Systems (ITS). Despite the promising potential of Deep Learning (DL) methods for vehicle detection, the presence of uncertainties such as varying vehicle shapes and sizes, intricate background clutter, and unpredictable vehicle flow contribute to the chaotic unregulated traffic. For real-time vehicle detection in an unregulated traffic environment this paper proposes the Hybrid Learning Multi-Vehicle Detection framework (HL-MVD) based on the Convolutional Multi-head Attention Transformer Detector (CMATDet). The primary objective of this study is to generate a dataset containing highly informative video frames using a Pool-based Active Learning Strategy (PALS). Additionally, transfer learning will be implemented to train CMATDet to achieve improved accuracy and reduced detection latency. The proposed approach restructures the baseline YOLOv5x and incorporates a Multi-head Attention transformer encoder to effectively extract global features and a Scale Specific Bidirectional Feature Pyramid Network (SS-BiFPN) to facilitate multi-scale feature representation. Simplified Optimal Transport Algorithm with top-q approximation technique (Sim-OTA) is utilized for label assignment approach. Heatmap analysis demonstrated the suitability of the newly generated dataset “AU-INV-P-PALS” for detecting specific Indian native vehicles. The performance of the proposed framework is evaluated on our custom-developed AU-INV-P-PALS vehicle dataset and the IITM-HeTra Dataset 1. In comparison with contemporary detection models, the proposed HL-MVD framework resulted with higher mAP scores (91.1% on mAP@0.5 and 78.3% on mAP@0.5:0.95) for AU-INV-P-PALS. The proposed model demonstrated lower inference latency (8.1 ms), higher precision score (82.7% for IoU = 0.5), and higher recall score (90.8% for IoU = 0.5) than recent deep learning-based detection models in the literature. The top-q approximation technique in the detection head results in a reduced false-positive rate compared to conventional models. Finally, the performance of the proposed framework is tested on CCTV traffic footage captured on city roads in Chennai, India.

Full Text