Abstract

Accurate object detection on the road is the most important requirement of autonomous vehicles. Extensive work has been accomplished for car, pedestrian, and cyclist detection; however, comparatively, very few efforts have been put into 2D object detection. In this article, a dynamic approach is investigated to design a perfect unified neural network that could achieve the best results based on our available hardware. The proposed architecture is based on CSPNet for feature extraction in an end-to-end way. The net extracts visual features by using backbone subnet, visual object detection is based on a feature pyramid network (FPN). In order to increase the net flexibility, an auto-anchor generating method is applied to the detection layer that makes the net suitable for any datasets. For fine-tuning the net, activation, optimization, and loss functions are considered along with multiple check points. The proposed net is trained and tested based on the benchmark KITTI dataset. Our extensive experiments show that the proposed model for visual object detection is superior to others, where other nets output very low accuracy for pedestrian and cyclist detection, our proposed model achieves 99.3% recall rate based on our dataset.

Highlights

  • In the field of autonomous vehicles, accurate road scene perception plays a vital role to avoid accidents

  • Visual object detection is still a challenging task. It demands a great deal of efforts, especially for vulnerable road users like pedestrians and cyclists that occupy more than half of on-road death tolls as published by the World Health Organization (WHO) [46]

  • Traditional computer vision approaches were based on Histogram of Oriented Gradient (HOG) [7] for feature representation with the classifiers like SVM (i.e., Support Vector Machine)

Read more

Summary

Introduction

In the field of autonomous vehicles, accurate road scene perception plays a vital role to avoid accidents. Visual object detection is still a challenging task It demands a great deal of efforts, especially for vulnerable road users like pedestrians and cyclists that occupy more than half of on-road death tolls as published by the World Health Organization (WHO) [46]. ResNet [12] and DenseNet [16] emphasized on carrying forward the residual information to avoid extreme compression of ground information by using skip connections and direct connections between subsequent layers, respectively. These models were reused in other computer vision applications like object segmentation [36]

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.