Object detection in traffic videos: an optimized approach using super-resolution and maximal clique algorithm

Iván García-Aguilar,Jorge García-González,Rafael Marcos Luque-Baena,Ezequiel López-Rubio

doi:10.1007/s00521-023-08741-4

Abstract

Detection of small objects is one of the main challenges to be improved in deep learning, mainly due to the small number of pixels and scene’s context, leading to a loss in performance. In this paper, we present an optimized approach based on deep object detection models that allow the detection of a higher number of elements and improve the score obtained for their class inference. The main advantage of the presented methodology is that it is not necessary to modify the internal structure of the selected convolutional neural network model or re-training for a specific scene. Our proposal is based on detecting initial regions to generate several sub-images using super-resolution (SR) techniques, increasing the number of pixels of the elements, and re-infer over these areas using the same pre-trained model. A reduced set of windows is calculated in the super-resolved image by analyzing a computed graph that describes the distances among the preliminary object detections. This analysis is done by finding maximal cliques on it. This way, the number of windows to be examined is diminished, significantly speeding up the detection process. This framework has been successfully tested on real traffic sequences obtained from the U.S. Department of Transportation. An increase of up to 44.6% is achieved, going from an average detection rate for the EfficientDet D4 model of 14.5% compared to 59.1% using the methodology presented for the first sequence. Qualitative experiments have also been performed over the Cityscapes and VisDrone datasets.

Full Text