Automatic image description generation is an active research area in Computer Vision and Natural Language Processing. Objects, their attributes, actions, and spatial relationships are identified in the image description generation system. Earlier, these systems used classical machine learning approaches. Later majority of these works follow deep learning strategies. The essential goal of these systems is to produce syntactically and semantically correct sentences. This review aims to synthesize the studies conducted from 2010 to 2023 to get a deeper view of various image description generation systems and their applications. The prominent contribution of this review is that it covers the different aspects of image captioning systems, such as the methods used, the various applied domains, evaluation measures, and the datasets used. A single synthesized study directs scholars regarding developing image captioning systems to date utilizing machine learning approaches. It also offers suggestions for researchers in this sector for the future. Image captioning is applied in many fields like natural images, medical images, remote sensing images, videos, etc. This review paper reviews the various taxonomies of image description generation systems. We also analyzed multiple methods used in the architecture of image captioning systems. The datasets and the evaluation metrics used in these systems are discussed. We studied the performance of the system under various circumstances.