A Resource-Efficient CNN-Based Method for Moving Vehicle Detection.

Zakaria Charouh,Mounir Ghogho,Amal Ezzouhri,Zouhair Guennoun

doi:10.3390/s22031193

Abstract

There has been significant interest in using Convolutional Neural Networks (CNN) based methods for Automated Vehicular Surveillance (AVS) systems. Although these methods provide high accuracy, they are computationally expensive. On the other hand, Background Subtraction (BS)-based approaches are lightweight but provide insufficient information for tasks such as monitoring driving behavior and detecting traffic rules violations. In this paper, we propose a framework to reduce the complexity of CNN-based AVS methods, where a BS-based module is introduced as a preprocessing step to optimize the number of convolution operations executed by the CNN module. The BS-based module generates image-candidates containing only moving objects. A CNN-based detector with the appropriate number of convolutions is then applied to each image-candidate to handle the overlapping problem and improve detection performance. Four state-of-the-art CNN-based detection architectures were benchmarked as base models of the detection cores to evaluate the proposed framework. The experiments were conducted using a large-scale dataset. The computational complexity reduction of the proposed framework increases with the complexity of the considered CNN model’s architecture (e.g., 30.6% for YOLOv5s with 7.3M parameters; 52.2% for YOLOv5x with 87.7M parameters), without undermining accuracy.

Highlights

One of the fundamental pillars of road safety strategies is deploying Automated Vehicular Surveillance (AVS) systems to monitor driving behaviors and detect dangerous driving patterns
This method may produce poor detection results due to several reasons, such as: (1) object overlapping: depending on the of view, the projected image from the real domain may contain overlapped vehicles—multiple vehicles may be misclassified as one heavy vehicle; (2) parameter tuning: the performance of Background Subtraction (BS) methods greatly depends on the threshold values set for their parameters; inadequate tuning lead to false detections or undetected vehicles; (3) dynamic background conditions: even when the camera is static, the background may change due to sudden illumination changes where intensities of pixels belonging to the background can significantly vary over very short periods leading to the misclassification of background pixels as foreground pixels [1]
The pixel assignment is done using a difference threshold—the pixel is compared to the background model, pixels exceeding the threshold parameter are classified as foreground pixels; the rest are classified as background pixels

Summary

Introduction

One of the fundamental pillars of road safety strategies is deploying Automated Vehicular Surveillance (AVS) systems to monitor driving behaviors and detect dangerous driving patterns. Sophisticated video analytics methods based on Deep Learning (DL) are generally computationally demanding, which may hinder their real-time implementation. This motivates the development of techniques to reduce their complexity. BS is a motion-based method that assigns each pixel in a video frame to either the background class (i.e., static objects) or the foreground class (i.e., moving objects). This paper proposes a framework combining BS and CNN-based detectors to minimize the number of convolutions needed to process a given frame, reducing the computational complexity without undermining the detection performance. BS is a motion-based object detection method; it compares a given frame to the background to extract regions representing moving objects, i.e., vehicles in the scene. The output is a binary mask of moving objects pixels’ locations in the original frame

Objectives

Results

Conclusion