Abstract

Distinct from conventional approaches to generic vehicle type recognition, this paper proposes an approach to fine-grained vehicle type recognition that leverages multiple cues jointly to convolutional neural networks (CNNs). Fine-grained vehicle recognition is inherently difficult with many challenges, including (i) how to incorporate multiple cues through joint optimization instead of separately handling, (ii) how to learn data-driven representation for vehicles instead of hand-crafted features and (iii) how to perceive and localize the vehicle region instead of using the whole image. Our multi-task CNNs localize vehicles in the first stage and recognize subclasses in the second stage, allowing us to handle samples with cluttered backgrounds and those where vehicles do not fill most of the image. Through collaborative feature learning from multiple tasks in each stage, our approach can handle subtle inter-class variations at the subordinate level. We also develop new vehicle-oriented data augmentation strategies in CNN training. To advance research on vehicle-centered tasks, we release a new vehicle dataset that provides both semantic labels and bounding boxes. This dataset can be used in vehicle localization, recognition, viewpoint estimation, and so on. Extensive experiments on two benchmark vehicle datasets demonstrate that our approach outperforms state-of-the-art algorithms.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.