Abstract

AIGC (Artificial Intelligence Generated Content) is a novel AI technology that encompasses tasks such as text-to-image generation, text-to-text generation, and image-to-text generation. In the process of child language acquisition, some children may face challenges, exhibiting symptoms such as delayed language development, limited vocabulary, and poor expressive ability. To address this issue, the ”Look and Speak” method can be employed, which allows children to learn and express language by observing images. In our paper, we build a dataset, named CODP-1200, benchmark for assisting in children language acquisition, which is curated and augmented using AIGC techniques. The dataset consists of 1200 children cartoon images paired with 6000 corresponding sentences that are used to describe them. Initially, we carefully curated and selected twelve Chinese language textbooks, ranging from the first to the sixth grade, as part of the primary compulsory education curriculum, to construct the foundational corpus. Based on the original data, two famous large language models ChatGPT and SparkDesk are employed for data augmentation, subsequently. Finally, the ERNIE-ViLG is utilized to generate children’s style images corresponding to the textual descriptions. In addition, based on our proposed dataset, we propose a benchmark approach called DDMXCap, which is a diffusion-based model for image captioning, specifically from image to text. Experimental results demonstrate that our method achieves promising performance in children’s image captioning tasks and provides a standardized learning process for child language acquisition. The implementation codes for our approach and build dataset are available at https://github.com/Leng-bingo/Chinese-Child-Captions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call