Abstract

Frequent and accurate object detection based on remote sensing images is an encouraging approach for monitoring dynamic of the interest object on earth surface. Transformer-based object detection was recently developed to cope with the trade-off dilemma between large computation load and accuracy sacrifice confronted by region-proposal-based and regression-based object detection, and its self-attention mechanism can provide a global understanding that has potential ability for reasoning the location relationship within sparsely heterogeneously distributed geospatial objects. However, Transformer-based object detection is essentially weak at modeling local feature hierarchy to compensate for the large scale variation of geospatial object, and it is extremely difficult to train due to the lack of inductive bias, resulting in a slow convergence. To overcome the problem, this study proposed a Dual network structure with InterweAved Global-local feature hierarchy based on the TRansformer architecture (DIAG-TR), to alleviate the incompatibility of global and local feature form, and hierarchically embed the local features into global representations. Besides, a learnable anchor box is incorporated into the positional query in the decoder part to provide a spatial prior, which can accelerate convergence. The proposed DIAG-TR is validated on the widely used optical remote sensing image DIOR dataset, and the results demonstrate that the global-local feature hierarchy contributes 3.4% mean average precision compared to the original Transformer-based method, and the convergence time is shortened by 2.5-fold. State-of-the-art methods are also participated as benchmark for comparison, and DIAG-TR outperforms Faster-RCNN-FPN by 8.9%, which proves that DIAG-TR has great potential in earth observation community.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.