EGAC3D: enhancing depth adaptive convolution and depth estimation for monocular 3D object pose detection.

Duc Tuan Ngo,Duc Dung Nguyen,Hoang-Anh Pham,Minh-Quan Viet Bui

doi:10.7717/peerj-cs.1144

Abstract

Many alternative approaches for 3D object detection using a singular camera have been studied instead of leveraging high-precision 3D LiDAR sensors incurring a prohibitive cost. Recently, we proposed a novel approach for 3D object detection by employing a ground plane model that utilizes geometric constraints named GAC3D to improve the results of the deep-based detector. GAC3D adopts an adaptive depth convolution to replace the traditional 2D convolution to deal with the divergent context of the image's feature, leading to a significant improvement in both training convergence and testing accuracy on the KITTI 3D object detection benchmark. This article presents an alternative architecture named eGAC3D that adopts a revised depth adaptive convolution with variant guidance to improve detection accuracy. Additionally, eGAC3D utilizes the pixel adaptive convolution to leverage the depth map to guide our model for detection heads instead of using an external depth estimator like other methods leading to a significant reduction of time inference. The experimental results on the KITTI benchmark show that our eGAC3D outperforms not only our previous GAC3D but also many existing monocular methods in terms of accuracy and inference time. Moreover, we deployed and optimized the proposed eGAC3D framework on an embedded platform with a low-cost GPU. To the best of the authors' knowledge, we are the first to develop a monocular 3D detection framework on embedded devices. The experimental results on Jetson Xavier NX demonstrate that our proposed method can achieve nearly real-time performance with appropriate accuracy even with the modest hardware resource.

Full Text