Abstract

Three-dimensional (3D) object detection is an important task in the field of machine vision, in which the detection of 3D objects using monocular vision is even more challenging. We observe that most of the existing monocular methods focus on the design of the feature extraction framework or embedded geometric constraints, but ignore the possible errors in the intermediate process of the detection pipeline. These errors may be further amplified in the subsequent processes. After exploring the existing detection framework of keypoints, we find that the accuracy of keypoints prediction will seriously affect the solution of 3D object position. Therefore, we propose a novel keypoints uncertainty prediction network (KUP-Net) for monocular 3D object detection. In this work, we design an uncertainty prediction module to characterize the uncertainty that exists in keypoint prediction. Then, the uncertainty is used for joint optimization with object position. In addition, we adopt position-encoding to assist the uncertainty prediction, and use a timing coefficient to optimize the learning process. The experiments on our detector are conducted on the KITTI benchmark. For the two levels of easy and moderate, we achieve accuracy of 17.26 and 11.78 in AP3D, and achieve accuracy of 23.59 and 16.63 in APBEV, which are higher than the latest method KM3D.

Highlights

  • The understanding of 3D properties of objects in the real world is critical for visionbased autonomous driving and traffic surveillance systems [1,2,3,4,5]

  • Keypoints labels for supervision are obtained by projecting 3D truth values of the left and right images, and we use image inversion, image scaling and other technologies to enhance the dataset

  • Since most of the current works of monocular 3D object detection are devoted to the detection of cars, we first conduct qualitative and quantitative analysis on this category

Read more

Summary

Introduction

The understanding of 3D properties of objects in the real world is critical for visionbased autonomous driving and traffic surveillance systems [1,2,3,4,5]. There are three main methods for 3D object detection: monocular 3D object detection, stereo-based 3D object detection and LIDAR-based 3D object detection. The LIDAR-based and the stereo-based detection methods can usually obtain higher detection accuracy with the provision of reliable depth information. The radar system has the disadvantages of high cost, high energy consumption, and short service life. The monocular detection method, which is characterized by low cost and low energy consumption, has received extensive attention and attracted researchers to conduct studies in this field. Our work focuses on the improvements in monocular 3D object detection techniques

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call