Abstract

Automatic captioning of images no only enrich multimedia content with descriptive features, but also helps in detecting patterns, trends, and events of interest. Particularly, Arabic Image Caption Generation is a very challenging topic in the machine learning field. This paper presents, AraCap, a hybrid object-based, attention-enriched image captioning architecture, with a focus on Arabic language. Three models are demonstrated, all of them are implemented and trained on COCO and Flickr30k datasets, and then tested by building an Arabic version of a subset of COCO dataset. The first model is an object-based captioner that can handle one or multiple detected objects. The second is a combined pipeline that uses both object detector and attention-based captioning; while the third one is based on a pure soft attention mechanism. The models are evaluated using multi-lingual semantic sentence similarity techniques to assess the generated captions accuracy against the actual ground truth captions. Results show that similarity scores for Arabic generated captions from all three proposed models outperformed the basic captioning technique.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.