A significant amount of multimedia data consists of digital images, and multimedia content analysis is used in many real-world computer vision applications. Multimedia information, especially photos, has become much more complicated in the last several years. Every day, millions of photos are posted to various websites, such as Instagram, Facebook, and Twitter. Finding a suitable image in an archive is a difficult research subject for the field of computer vision. Most search engines use standard text-based techniques that depend on metadata and captions in order to fetch photos. Over the past 20 years, a great deal of research has been conducted on content-based image retrieval (CBIR), picture categorization, and analysis. In image classification models and CBIR, high-level picture representations are represented as feature vectors made up of numerical values. Empirical evidence indicates a considerable disparity between picture feature representation and human visual understanding. Reducing the semantic gap between human visual understanding and picture feature representation is the aim of this study. This study aims to do a thorough analysis of the latest advancements in the domains of Content-Based picture Retrieval and picture representation. We performed a comprehensive analysis of many models for image retrieval and picture representation, encompassing the most recent advancements in semantic deep-learning methods and feature extraction. This paper provides an in-depth analysis of the key ideas and important studies related to image representation and content-based picture retrieval. In an effort to stimulate more research in this field, it also offers a preview of potential future study topics.