Abstract

In long short-term memory (LSTM) neural networks, the input gates and output gates control information flowing into and out of memory cells. For sequence-to-sequence learning problems, each element is input into the network only once. If the input gates are closed at a certain step, the information is lost and is not input again. The same problem exists for the output gates. Therefore, the input and output gates do not fully support the roles of gating. An LSTM network with external memories, in which separate memories are installed for the input and output gates, is proposed. Information that is blocked by the input gates is preserved in the input memories, enabling the cells to read these memories when necessary. Similarly, information blocked by the output gates is preserved in the output memories and flows out to hidden units of the network at an appropriate time. In addition, a dynamic attention model is proposed to take into account the attention history. It provides guidance when predicting the attention weights at each step. The proposed model exploits attention-based encoder–decoder architecture to generate image captions. Experiments were conducted on three benchmark datasets, namely Flickr8k, Flickr30k, and MSCOCO, to demonstrate the effectiveness of the proposed approach. Captions generated by the proposed method are longer and more informative than those obtained with the original LSTM network.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.