Abstract

The image caption is a technology that enables us to understand the contents and generate descriptive text, of images using machines. With the development of deep learning, means of using it to understand image content and generate descriptive text has become a hot research topic. This paper proposes a multilayer dense attention model for image caption. A faster recurrent convolutional neural networks (Faster R-CNN) is employed to extract image features as the coding layer, the long short-term memory (LSTM)-attend is used to decode the multilayer dense attention model, and the description text is generated. The model parameters are optimized using strategy gradient optimization in reinforcement learning. Use of dense attention mechanisms in the coding layer can effectively avoid the interference of non-salient information and selectively output the corresponding description text for the decoding process. The experimental results in the field of general images validate the model's good ability to understand images and generating text.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.