Abstract

Recent advances in deep learning have enabled machines to see, hear, and even speak. In some cases, with the help of deep learning, machines have also outperformed humans in these complex tasks. Such improvements have reignited interest in many fields. Image captioning, which is considered an intersection between computer vision and natural language processing, has recently received significant attention. Deep learning-based image captioning models represent a great improvement on traditional methods. However, most of the work done in image captioning is based on supervised deep learning methods. Recently, unsupervised image captioning has started to gather momentum. This paper presents the first survey that focuses on unsupervised and semi-supervised image captioning techniques and methods. Additionally, the survey shows how such methods can be used with different data availability and data pairing settings, where some methods can be used with paired data, while others can be used with unpaired data. Furthermore, special cases of unpaired data, such as cross-domain and cross-lingual image captioning, are also discussed. Finally, the survey presents a discussion on challenges and future research directions of image captioning.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.