Multi-GRU Based Automated Image Captioning for Smartphones

Rumeysa Keskin,Aytuğ Onan,Özge Taylan Moral,Volkan Kılıç

doi:10.1109/siu53274.2021.9477901

Abstract

Image captioning is the description of an image with natural language expressions using computer vision and natural language processing fields. Recent advances in hardware and processing power in smartphones lead the development of many image captioning applications. In this study, a novel automatic image captioning system based on the encoder-decoder approach that can be applied in smartphones is proposed. While high-level visual information is extracted with the ResNet152V2 convolutional neural network in the encoder part, the proposed decoder transforms the extracted visual information into natural expressions of the images. The proposed decoder with the multilayer gated recurrent unit structure allows generating more meaningful captions using the most relevant visual information. The proposed system has been evaluated using different performance metrics on the MSCOCO dataset and it outperforms the state-ofthe-art approaches. The proposed system is also integrated with our custom-designed Android application, named IMECA, which generates captions in offline mode unlike similar applications. Thus, image captioning is intended to be practical for more people.

Full Text