Abstract

Vehicle detection with category inference on video sequence data is an important but challenging task for urban traffic surveillance. The difficulty of this task lies in the fact that it requires accurate localization of relatively small vehicles in complex scenes and expects real-time detection. In this paper, we present a vehicle detection framework that improves the performance of the conventional Single Shot MultiBox Detector (SSD), which effectively detects different types of vehicles in real-time. Our approach, which proposes the use of different feature extractors for localization and classification tasks in a single network, and to enhance these two feature extractors through deconvolution (D) and pooling (P) between layers in the feature pyramid, is denoted as DP-SSD. In addition, we extend the scope of the default box by adjusting its scale so that smaller default boxes can be exploited to guide DP-SSD training. Experimental results on the UA-DETRAC and KITTI datasets demonstrate that DP-SSD can achieve efficient vehicle detection for real-world traffic surveillance data in real-time. For the UA-DETRAC test set trained with UA-DETRAC trainval set, DP-SSD with the input size of 300 × 300 achieves 75.43% mAP (mean average precision) at the speed of 50.47 FPS (frames per second), and the framework with a 512 × 512 sized input reaches 77.94% mAP at 25.12 FPS using an NVIDIA GeForce GTX 1080Ti GPU. The DP-SSD shows comparable accuracy, which is better than those of the compared state-of-the-art models, except for YOLOv3.

Highlights

  • Automatic analysis of vehicle activities in urban traffic surveillance is an important and urgent issue due to a large number of vehicle traffic rule violations and their adverse effects on daily traffic management

  • Search algorithm by a Region Proposal Network (RPN), and merges the RPN with Fast R-convolutional neural network (CNN) into a single network by sharing convolutional layers using “attention” mechanisms, which achieves towards real-time detection performance with guaranteed accuracy

  • To ensure good performance on small vehicles with complex backgrounds in aerial vehicle detection, an improvement algorithm based on Faster Regions with CNN features (R-CNN) is presented in Reference [42], which is applied to the Munich vehicle dataset using a hyper region proposal network (HPRN) and achieves great improvements in accuracy compared to the existing methods

Read more

Summary

Introduction

Automatic analysis of vehicle activities in urban traffic surveillance is an important and urgent issue due to a large number of vehicle traffic rule violations and their adverse effects on daily traffic management. Compared with traditional machine learning tasks, deep learning-based methods have made great breakthroughs in traffic surveillance techniques, and have achieved good performance in practical applications such as vehicle detection, feature extraction and vehicle track identification [1,2,3,4,5] In these areas of research, accurate and real-time vehicle detection and preliminary classification are the most fundamental and important work. For region-based CNNs, Regions with CNN features (R-CNN) [6], Spatial Pyramid Pooling Network (SSP-Net) [7], Fast R-CNN [8] and Faster R-CNN [9] are some recent advances often utilized in vehicle detection These approaches, achieving state-of-the-art accuracy through the improvement from Selective Search [10] to Region Proposal Network (RPN) [9], are too computationally intensive for bounding boxes to be too slow for real-time or near real-time.

Conventional methods proposedDP-SSD300
Section 4 verifies the excellence
Related Work
Deep CNNs for Object Detection
Deep CNNs for Vehicle Detection
The Proposed Vehicle Detection Method
Feature Concatenation
Feature Concatenation through Deconvolution for Localization
Feature Concatenation through Pooling for Categorization
Default
Deep Nets Training
Experiments and Results
Training Dataset and Data Augmentation
Implementation Details
Experiments on UA-DETRAC Dataset
The Importance of Feature Concatenation
Method
Comparisons with State-of-the-art Detection Methods
Experiments on KITTI Dataset
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call