Abstract

An image caption generation model with adaptive attention mechanism is proposed for dealing with the weakness of the image description model by the local image features. Under the framework of encoder and decoder architecture, the local and global features of images are extracted by using inception V3 and VGG19 network models at the encoder. Since the adaptive attention mechanism proposed in this paper can automatically identify and acquire the importance of local and global image information, the decoder can generate sentences describing the image more intuitively and accurately. The proposed model is trained and tested on Microsoft COCO dataset. The experimental results show that the proposed method can extract more abundant and complete information from the image and generate more accurate sentences, compared with the image caption model based on local features.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.