Abstract

In this paper, we present an efficient and effective framework for vehicle detection and classification from traffic surveillance cameras. First, we cluster the vehicle scales and aspect ratio in the vehicle datasets. Then, we use convolution neural network (CNN) to detect a vehicle. We utilize feature fusion techniques to concatenate high-level features and low-level features and detect different sizes of vehicles on different features. In order to improve speed, we naturally adopt fully convolution architecture instead of fully connection (FC) layers. Furthermore, recent complementary advances such as batch-norm, hard example mining, and inception have been adopted. Extensive experiments on JiangSuHighway Dataset (JSHD) demonstrate the competitive performance of our method. Our framework obtains a significant improvement over the Faster R-CNN by 6.5% mean average precision (mAP). With 1.5G GPU memory at test phase, the speed of the network is 15 FPS, three times faster than the Faster R-CNN.

Highlights

  • Vehicle detection is a very important component in traffic surveillance and automatic driving [1]

  • In the two-stage approach, a sparse set candidate object boxes is first generated by selective search or region proposal network, and they are classified and regressed

  • We evaluate our framework on JiangSuHighway Dataset (JSHD) (Fig. 1) and obtain a significant improvement over the state-of-the-art Faster R-convolution neural network (CNN) by 6.5% mean average precision (mAP)

Read more

Summary

Introduction

Vehicle detection is a very important component in traffic surveillance and automatic driving [1]. Vehicle detection is still an important challenge in computer vision. Current top deep-network-based object detection frameworks can be divided into two categories: the two-stage approach, including [4,5,6,7,8], and one-stage approach, including [9,10,11]. In the two-stage approach, a sparse set candidate object boxes is first generated by selective search or region proposal network, and they are classified and regressed. In the one-stage approach, the network straightforward generated dense samples over locations, scales, and aspect ratios; at the same time, these samples will be classified and regressed. The main advantage of one-stage is real time; its detection accuracy is usually behind the two-stage, and one of the main reasons is class imbalance problem [12]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call