Abstract

Estimating position and stereographic geometry information of noncooperative spacecraft through three-dimensional (3-D) reconstruction techniques is of great significance. Depth information acquisition is an important component of 3-D reconstruction. Monocular cameras are cheaper and more widely used than depth sensors. The relative positions of noncooperative spacecraft and our spacecraft can be calculated from depth maps of monocular depth estimation and the camera parameters, providing data support for subsequent tracking and capture missions. This paper proposes a monocular depth estimation network combining the convolutional neural network (CNN) and a vision transformer (VIT) to improve the prediction accuracy of few-shot samples. We extract detail features and global features from the CNN and VIT encoders, respectively, and then fuse deep features and shallow features by a skip-connected upsampling decoder. Compared with the representative depth estimation algorithms in recent years on the NYU-Depth V2 dataset, the proposed network structure combines the advantages of the CNN and VIT as well as estimates the global depth of the scene more accurately while maintaining details. To solve the lack of spacecraft data collection, a new dataset is made from 3-D simulation models. Experiments on the self-made dataset demonstrate the feasibility of this method in aerospace engineering.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call