Abstract

In order to solve the problems of low quality of image description, insufficient use of image features, and single level of recurrent neural network in image description generation, this paper proposes an image description generation method based on multi-scale features and computer vision. The algorithm uses the pre trained target detection network to extract the image features on different layers of the convolution neural network, inputs the image features into the multi attention structure layer by layer, connects the multi attention structure with the multi-layer recurrent neural network in turn, and constructs a multi-level image description generation network model. Adding residual connections to multilayer recurrent neural networks can improve the network performance, and can effectively avoid the network degradation caused by network deepening. The experimental results show that in the mscoco test set, the bleu-1 and cider scores of the proposed algorithm can reach 0.804 and 1.167 respectively, which is significantly better than the top-down image description generation algorithm based on a single attention structure. Conclusionthe image description generated by the proposed algorithm can show better image details.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call