Abstract

The procedure of generating natural language captions for an image is known as image captioning. Automatic image captioning is a particularly challenging task that stands at the junction of Computer Vision and Natural Language Processing. It has a variety of applications, including text-based image retrieval, assisting visually impaired users, and human-robot interaction. The majority of publications on the subject focus on the English language, which is an analytical language with characteristics differing from the agglutinative Turkish language. This work introduces the Turkish MS COCO dataset that extends the original MS COCO collection with captions in the Turkish language; experimental results surpass the current state-of-the-art for the Turkish image captioning field. Furthermore, the newly introduced database is also applicable for the study of machine translation. On the Turkish MS COCO dataset, the best performance has been achieved with the Meshed Memory Transformers with a Bleu-1 score of 0.72. The database is publicly available at https://github.com/BilgiAILAB/TurkishImageCaptioning. It is desired that the Turkish MS COCO dataset with the proposed benchmark will be an excellent resource for future studies on Turkish image captioning.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.