Abstract

The image captioning of fine art paintings aims at generating content descriptions for the paintings. Due to the complexity of modeling both image and language, this task usually needs sufficient training data. However, different from photographic image captioning, there are few satisfactory datasets for painting captioning. In this article, we introduce a painting captioning dataset (named the ArtCap dataset), which contains 3606 paintings and five descriptions for each painting. We present the carefully designed construction pipeline of our dataset and further evaluate our dataset from two aspects of annotation quality and application effectiveness, respectively. For the annotation quality, we compare the global characteristics, annotation content, and annotation consistency of our dataset with other painting descriptions datasets. For application effectiveness, we employ our dataset and other painting descriptions datasets to train image captioning models and analyze the captioning performances. The results demonstrate the promising annotation quality and application effectiveness of our dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call