Abstract
The interdisciplinary convergence of computer vision and object detection is pivotal for advancing intelligent image analysis. This research surpasses conventional object recognition methodologies by delving into a more nuanced understanding of images, akin to human visual comprehension. It explores deep learning and established object detection systems such as convolutional neural networks (CNN), Region-based CNN (R-CNN), and you only look once (YOLO). The proposed model excels in realtime object recognition, outperforming its predecessors, as previous systems typically detect only a limited number of objects in an image and are most effective at a distance of 5-6 meters. Uniquely, it employs Google Translate for the verbal identification of detected objects, offering a crucial accessibility feature for individuals with visual impairments. This study integrates computer vision, deep learning, and real-time object recognition to enhance visual perception, providing valuable assistance to those facing visual challenges. The proposed method utilizes the Common Objects in Context (COCO) dataset for image comprehension, employing object detection and object tracking with a deep neural network (DNN). The system's output is converted into spoken words through a text-to-speech feature, empowering visually impaired individuals to comprehend their surroundings effectively. The implementation involves key technologies such as NumPy, OpenCV, pyttsx3, PyWin32, OpenCV-contribpython, and winsound, contributing to a comprehensive system for computer vision and audio processing. Results demonstrate successful execution, with the camera consistently detecting and labeling 5-6 objects in real time.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have