Joint estimation of pose, depth, and optical flow with a competition–cooperation transformer network

Xiaochen Liu,Tao Zhang,Mingming Liu

doi:10.1016/j.neunet.2023.12.020

Abstract

Estimating depth, ego-motion, and optical flow from consecutive frames is a critical task in robot navigation and has received significant attention in recent years. In this study, we propose PDF-Former, an unsupervised joint estimation network comprising a full transformer-based framework, as well as a competition and cooperation mechanism. The transformer framework captures global feature dependencies and is customized for different task types, thereby improving the performance of sequential tasks. The competition and cooperation mechanisms enable the network to obtain additional supervisory information at different training stages. Specifically, the competition mechanism is implemented early in training to achieve iterative optimization of 6 DOF poses (rotation and translation information from the target image to the two reference images), the depth of target image, and optical flow (from the target image to the two reference images) estimation in a competitive manner. In contrast, the cooperation mechanism is implemented later in training to facilitate the transmission of results among the three networks and mutually optimize the estimation results. We conducted experiments on the KITTI dataset, and the results indicate that PDF-Former has significant potential to enhance the accuracy and robustness of sequential tasks in robot navigation.

Full Text