Monocular 3-D Object Detection Based on Depth-Guided Local Convolution for Smart Payment in D2D Systems

Jun Li,Jun Zhang,Yongbin Gao,Wei Wang,Wei Song,Huixing Wang,Bo Huang,Yier Yan

doi:10.1109/jiot.2021.3128440

Abstract

3-D object detection from mobile phones in Device-to-Device (D2D) system provides a new smart payment tool for the next generation of fintech, which is more flexible and efficient than the traditional barcode. In this article, we propose a monocular 3-D object detection method based on depth-guided local convolution. The method combines the information of RGB image mode and depth mode by using a convolution kernel through depth image and works on a single RGB image locally. According to the multiscale input information, the convolution kernel is adaptively adjusted to capture the target objects of different scales, so as to improve the performance of 3-D object detection. In addition, we use the soft-non-maximum suppression algorithm instead of traditional non-maximum suppression to select the best prediction box. In order to further improve the accuracy of 3-D object detection, the depth estimation network and 3-D object detection network are jointly trained in this method to make the two networks constrain each other and achieve the best performance.

Full Text