Abstract
Convolutional neural networks (CNNs) have been adopted in monocular satellite pose estimation and achieve superior performance over traditional methods. However, existing CNN-based methods suffer from bias toward texture, indirect description of absolute distance, and lack of long-range dependence modeling. Such factors limit the generalizability of CNN-based methods. Motivated by the striking achievements of transformer models, this article adopts transformer blocks for satellite pose estimation from a single RGB image, proposing an efficient monocular satellite pose estimation method. First, we design an effective satellite representation model based on a set of keypoints. Then, considering monocular satellite pose estimation characteristics, we construct an end-to-end keypoint-set prediction network and build the bipartite loss function. Further, we improve the backbone structure for high-quality feature extraction. Experimental results on a public benchmark dataset indicate that the proposed method achieves second and third place on the synthetic and real test sets, respectively, using only synthetic training data. We also demonstrate that our keypoint predictor takes half as much time as the first-placed method in our comparison, and therefore achieves a better tradeoff between speed and accuracy than existing approaches.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Aerospace and Electronic Systems
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.