Abstract

Natural language generation from images, referred to as image or visual captioning also, is an emerging deep learning application that is in the intersection between computer vision and natural language processing. Image captioning also forms the technical foundation for many practical applications. The advances in deep learning technologies have created significant progress in this area in recent years. In this chapter, we review the key developments in image captioning and their impact in both research and industry deployment. Two major schemes developed for image captioning, both based on deep learning, are presented in detail. A number of examples of natural language descriptions of images produced by two state-of-the-art captioning systems are provided to illustrate the high quality of the systems’ outputs. Finally, recent research on generating stylistic natural language from images is reviewed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call