Abstract
This work aims to provide an innovative solution to enhance the accessibility of images by an innovative image to text to speech system. It is applied to Hindu and Christian divine images. The method is applicable, among others, to enhance cultural understanding of these images by the visually impaired. The proposed system utilizes advanced object detection techniques like YOLO V5 and caption generation techniques like ensemble models. The system accurately identifies significant objects in images of Deities. These objects are then translated into descriptive and culturally relevant text through a Google text-to-speech synthesis module. The incorporation of text generation techniques from images introduces a new perspective to the proposed work. The aim is to provide a more comprehensive understanding of the visual content and allow visually impaired individuals to connect with the spiritual elements of deities through the immersive experience of auditory perception through a multimodal approach to make them feel inclusive in the community. This work is also applicable to preserve Cultural Heritage, Tourism and integrating with Virtual Reality (VR) and Augmented Reality (AR). Images of the artistic cultural legacy are hardly available in annotated databases, particularly those featuring idols. So we gathered, transcribed, and created a new database of Religious Idols in order to satisfy this requirement. In this paper, we experimented how to handle an issue of religious idol recognition using deep neural networks. In order to achieve this outcome, the network is first pre-trained on various deep learning models, and the best one which outperforms others is chosen. The proposed model achieves an accuracy of 96.75% for idol detection, and an approximate 97.06% accuracy for text generation according to the BLEU score.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.