Abstract

Monocular 3D detection is to obtain the 3D information of the object from the image. The mainstream methods mainly use L1 loss or L1-like loss to control the instance depth prediction. However, these methods have not achieved satisfactory results. One of the main reasons is that L1 loss or L1-like loss does not accurately reflect the fit between the predicted instance depth and the corresponding ground truth. Another of the main reason is that the instance depth on the RGB image hard to be directly learned in the network. In order to solve the above problems, a novel thermodynamic loss based on the principle of free energy minimisation and a novel depth decoupling method are proposed in this paper. The proposed method is called the monocular 3D object detection network with thermodynamic loss and decoupled instance depth (TDN). In TDN, the optimisation of the instance depth prediction is regarded as the thermodynamic process. Therefore, the thermodynamic loss is designed according to the principle of free energy minimisation. TDN decouples the instance depth into three different depths. By combining the thermodynamic loss and the different types of depths, we can obtain the final instance depth.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call