Abstract

Object recognition and depth perception are two tightly coupled tasks that are indispensable for situational awareness. Most autonomous systems are able to perform these tasks by processing and integrating data streaming from a variety of sensors. The multiple hardware and sophisticated software architectures required to operate these systems makes them expensive to scale and operate. This paper implements a fast, monocular vision system that can be used for simultaneous object recognition and depth perception. We borrow from the architecture of a start-of-the-art object recognition system, YOLOv3, and extend its architecture by incorporating distances and modifying its loss functions and prediction vectors to enable it to multitask on both tasks. The vision system is trained on a large database acquired through the coupling of LiDAR measurements with complementary 360-degree camera to generate a high-fidelity labeled dataset. The performance of the multipurpose network is evaluated on a test dataset consisting of a total of 7,634 objects collected on a different road network. When compared with ground truth LiDAR data, the proposed network achieves a mean absolute percentage error rate of 11% on the passenger car within 10 m and a mean error rate of 7% or 9% on the truck within 10 m and beyond 10 m, respectively. It was also observed that adding a second task (depth perception) to the modeling network improved the accuracy of object detection by about 3%. The proposed multipurpose model can be used for the development of automated alert systems, traffic monitoring, and safety monitoring.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call