Abstract

This study proposes an end-to-end image description generation model based on word embedding technology to realise the classification and identification of Populus euphratica and Tamarix in complex remote sensing images by providing descriptions in precise and concise natural sentences. First, category ambiguity over large-scale regions in remote sensing images is addressed by introducing the co-occurrence matrix and global vectors for word representation to generate the word vector features of the object to be identified. Second, a new multi-level end-to-end model is employed to further describe the content of remote sensing images and to better advance the description tasks for P. euphratica and Tamarix in remote sensing images. Experimental results reveal that the natural language sentences generated using this method can better describe P. euphratica and Tamarix in remote sensing images compared with conventional deep learning methods.

Highlights

  • This study proposes an end-to-end image description generation model based on word embedding technology to realise the classification and identification of Populus euphratica and Tamarix in complex remote sensing images by providing descriptions in precise and concise natural sentences

  • The retrieval performance of the algorithm was improved by combining a sparse automatic encoder with convolutional neural networks (CNNs), which reduced the time required for labelling and improved the operational efficiency of the model

  • The validity of the approach was demonstrated, revealing that the model could effectively extract the semantic information of objects of interest and better describe the contents of remote sensing images. ­Scarpa[15] designed a very compact architecture using a CNN to achieve precise training of small-sized data sets; a good recognition effect was obtained for images derived from various multi-resolution sensors. ­Maggiori[16] proposed a spatially fine classification algorithm based on the pixel semantics of images obtained from aeronautical satellites in conjunction with a deep CNN

Read more

Summary

Decoding LSTM IndRNN

The optimum F-score value of 0.9069 at Table 3 is obtained when the forward propagation of the coding layer is the IndRNN and the backward propagation is the Bi-LSTM network This is because the neurons in the IndRNN are independent and facilitate the cross-layer transmission of information, which can better learn hidden details. The IndRNN-F + LSTM-B combination provides smaller P, R and F values than the IndRNN-F + BiRNNB combination because a single LSTM network can learn long sequences effectively; it ignores semantic information between some of the pixels in the fixed window and the global image As such, it is not well suited for describing image contents. 1. The proposed annotation strategy provides superior P, R and F values to the conventional scheme because the conventional method adopts single pixels for labelling, which ignores the correlation between adjacent pixels and cannot mine the overall semantic information of an image. The resolution of QuickBird image data is less than that of the UVA images, the spectral band is greater; this enhances the recognition effect for QuickBird images

Spectrum Texture Fusion Word vector Original features
Conclusion
Additional information

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.