Abstract

Highly accurate, camera-based object detection is an essential component of autonomous navigation and assistive technologies. In particular, for on-road applications, localization quality of objects in the image plane is important for accurate distance estimation, safe trajectory prediction, and motion planning. In this paper, we mathematically formulate and study a strategy for improving object localization with a deep convolutional neural network. An iterative region-of-interest pooling framework is proposed for predicting increasingly tight object boxes and addressing limitations in current state-of-the-art deep detection models. The method is shown to significantly improve the performance on a variety of datasets, scene settings, and camera perspectives, producing high-quality object boxes at a minor additional computational expense. Specifically, the architecture achieves impressive gains in performance (up to 6% improvement in detection accuracy) at fast run-time speed (0.22 s per frame on $1242 \times 375$ sized images). The iterative refinement is shown to impact subsequent vision tasks, such as object tracking in the image plane and in ground plane.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call