Evolutionary recurrent neural network for image captioning

Hanzhang Wang,Hanli Wang,Kaisheng Xu

doi:10.1016/j.neucom.2020.03.087

Abstract

Automatic architecture search is efficient to discover novel neural networks while it is mostly employed for pure vision or natural language tasks. However, cross-modality tasks are highly emphasized on the associative mechanisms between visual and language models rather than merely convolutional neural network (CNN) or recurrent neural network (RNN) with the best performance. In this work, the intermediary associative connection is approximated to the topological inner structure of RNN cell, which is further evolved by an evolutionary algorithm on the proxy of image captioning task. On the MSCOCO dataset, the proposed algorithm, starting from scratch, discovers more than 100 RNN variants with the performances all above 100 on CIDEr and 31 on BLEU4, and the top performance achieves 101.4 and 32.6 accordingly. Additionally, several unknown interesting patterns as well as many existing powerful structures are found in the generated RNNs. The patterns of operation and connection in the generated architecture are analyzed to understand the language modeling of cross-modality compared with general RNNs.

Full Text