Abstract

This paper presents an advanced urban traffic density estimation solution using the latest deep learning techniques to intelligently process ultrahigh-resolution traffic videos taken from an unmanned aerial vehicle (UAV). We first capture nearly an hour-long ultrahigh-resolution traffic video at five busy road intersections of a modern megacity by flying a UAV during the rush hours. We then randomly sampled over 17 K 512×512 pixel image patches from the video frames and manually annotated over 64 K vehicles to form a dataset for this paper, which will also be made available to the research community for research purposes. Our innovative urban traffics analysis solution consists of an advanced deep neural network (DNN) based vehicle detection and localization, type (car, bus, and truck) recognition, tracking, and vehicle counting over time. We will present extensive experimental results to demonstrate the effectiveness of our solution. We will show that our enhanced single shot multibox detector (Enhanced-SSD) outperforms other DNN-based techniques and that deep learning techniques are more effective than traditional computer vision techniques in traffic video analysis. We will also show that ultrahigh-resolution video provides more information that enables more accurate vehicle detection and recognition than lower resolution contents. This paper not only demonstrates the advantages of using the latest technological advancements (ultrahigh-resolution video and UAV), but also provides an advanced DNN-based solution for exploiting these technological advancements for urban traffic density estimation.

Highlights

  • In order to deeply understand city road traffic density and overcome the challenges brought by the real world Unmanned Aerial Vehicle (UAV) video data, we develop a robust Deep Vehicle Counting Framework (DVCF) which is capable of counting different types of vehicles in high resolution videos

  • This is because the feature dimension of VGGNet feature is 1024 and it is significantly lower than other types of features, A low dimension means less computational cost of the classifier

  • Classification accuracy (CA) score using ResNet feature is higher than the VGGNet feature, so we still use ResNet as our backbone network in the Enhanced-SSD in consideration of performance

Read more

Summary

Introduction

Conventional traffic monitoring systems rely on thousands of detectors (e.g. cameras, induction loops, radar sensors) deployed on fixed locations with small detecting ranges to help capture various road conditions throughout the network [1]–[4]. Such kind of systems have exhibited many limitations in terms of range and effectiveness. If the information is required beyond the scope of these fixed detectors (i.e. blind regions), human labors are frequently deployed to assess these particular road conditions [5]. It is essential to develop a more effective approach for acquiring visual information

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.