Abstract

Joint vehicle localization and categorization in high resolution aerial images can provide useful information for applications such as traffic flow structure analysis. To maintain sufficient features to recognize small-scaled vehicles, a regions with convolutional neural network features (R-CNN) -like detection structure is employed. In this setting, cascaded localization error can be averted by equally treating the negatives and differently typed positives as a multi-class classification task, but the problem of class-imbalance remains. To address this issue, a cost-effective network extension scheme is proposed. In it, the correlated convolution and connection costs during extension are reduced by feature map selection and bi-partite main-side network construction, which are realized with the assistance of a novel feature map class-importance measurement and a new class-imbalance sensitive main-side loss function. By using an image classification dataset established from a set of traditional real-colored aerial images with 0.13 m ground sampling distance which are taken from the height of 1000 m by an imaging system composed of non-metric cameras, the effectiveness of the proposed network extension is verified by comparing with its similarly shaped strong counter-parts. Experiments show an equivalent or better performance, while requiring the least parameter and memory overheads are required.

Highlights

  • For most of the sliding window-based vehicle detection methods involving localization and categorization, predictions are often performed in a separated manner, where the categories are estimated after the positional information is obtained

  • For these CNN-based methods, their underlying structures generally follow the regions with convolutional neural network features (R-CNN) [9] or its accelerated variants [10,11,12,13] with region of interest (ROI)-pooling [14]

  • Methods for joint vehicle localization and categorization in aerial images helps with important applications such as traffic flow analysis and suspicious vehicle detection

Read more

Summary

Introduction

For most of the sliding window-based vehicle detection methods involving localization and categorization, predictions are often performed in a separated manner, where the categories are estimated after the positional information is obtained. Haar [1], histogram of oriented gradients (HOG) [2,3], and local binary pattern (LBP) [3], etc.— they are less robust and accurate as the deep ones—can make a good compromise between speed and efficiency when the computational resources or the quantity of training samples are very limited Once these limitations no longer exist, the detection methods based on deep features are Remote Sens. 2017, 9, 494 often superior with strong resistance to disturbances in scale, lighting condition, and shadow, and their supreme performances have been repeatedly verified in many studies [4,5,6,7,8] For these CNN-based methods, their underlying structures generally follow the regions with convolutional neural network features (R-CNN) [9] or its accelerated variants [10,11,12,13] with region of interest (ROI)-pooling [14]. During the forward propagation, every input vector x will be processed by an affine transformation to get the output z, as in Equation (1)

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.