Abstract

Abstract: This paper presents a comprehensive project that integrates computer vision, natural language processing, and deep learning techniques to enhance object and speech recognition capabilities. The proposed system leverages the OpenCV library to identify objects within an input image, subsequently generating a new image annotated with the names of the recognized objects. Furthermore, the project incorporates the pyttsx3 module to convert the identified object names into speech, providing an additional layer of accessibility. The system extends its functionality by incorporating a contextual summarization component. Upon user input of contextual information related to the recognized objects, the system utilizes a language model, Large Language Model (LLM), to summarize the provided context. This summarization process contributes a Retrieval Augmented Generation (RAG) element, offering a quick and efficient overview of the given information. The seamless integration of object identification, speech synthesis, and contextual summarization enhances the user experience, making the system versatile and accessible. The proposed solution finds application in various domains such as assistive technology, image recognition, and natural language processing. The experimental results demonstrate the effectiveness and accuracy of the system, showcasing its potential contributions to the field of machine learning and deep learning application

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call