DESPP-DETR: A Dense Connection Efficient Spatial Pooling DEtection TRansformer for Vehicle Detection

Krishnendhu S P,Prabu Mohandas

doi:10.1145/3628426

Abstract

Real-time vehicle detection is a challenging and vital task in intelligent transportation systems. The key requirements for a vehicle detection model are speed and accuracy. However, existing real-time vehicle detection models often sacrifice one of these qualities in favor of the other. This trade-off makes them unfit for real-time deployment, where both speed and accuracy are equally important. Additionally, occlusion, which refers to the obstruction or partial covering of vehicles, further complicates detection and affects the system’s accuracy. In this study, we propose DESPP-DETR, a one-stage detection network for real-time vehicle detection. It is based on bipartite matching and a transformer encoder-decoder architecture, with the addition of a dense connection block and enhanced spatial pyramid pooling. The presence of dense connection block strengthens feature extraction. The enhanced spatial pyramid pooling eliminates the fixed-size constraint and increases the network’s learning capacity. When compared to existing models, DESPP-DETR achieves greater accuracy in real-time vehicle detection. On the MS COCO 2017 dataset, the proposed model achieves an improved mean average precision (mAP) of 75.53%, making it a promising solution for intelligent transportation systems.

Full Text