Deep Learning-Powered Visual Augmentation for the Visually Impaired

Gandrapu Satya Sai Surya Subrahmanya Venkata Krishna Mohan ,Mahammad Firose Shaik Mahammad Firose Shaik,G Usandra Babu G Usandra Babu,Manikandan Hariharan Manikandan Hariharan,Kiran Kumar Patro Kiran Kumar Patro

doi:10.2174/9789815305210125010013

Gandrapu Satya Sai Surya Subrahmanya Venkata Krishna Mohan , Mahammad Firose Shaik Mahammad Firose Shaik + Show 3 more

https://doi.org/10.2174/9789815305210125010013

Copy DOI

Export

Save

Cite

Publication Date: Jan 6, 2025

Abstract
Full-Text
Similar Papers

Abstract

Listen

The interdisciplinary convergence of computer vision and object detection is pivotal for advancing intelligent image analysis. This research surpasses conventional object recognition methodologies by delving into a more nuanced understanding of images, akin to human visual comprehension. It explores deep learning and established object detection systems such as convolutional neural networks (CNN), Region-based CNN (R-CNN), and you only look once (YOLO). The proposed model excels in realtime object recognition, outperforming its predecessors, as previous systems typically detect only a limited number of objects in an image and are most effective at a distance of 5-6 meters. Uniquely, it employs Google Translate for the verbal identification of detected objects, offering a crucial accessibility feature for individuals with visual impairments. This study integrates computer vision, deep learning, and real-time object recognition to enhance visual perception, providing valuable assistance to those facing visual challenges. The proposed method utilizes the Common Objects in Context (COCO) dataset for image comprehension, employing object detection and object tracking with a deep neural network (DNN). The system's output is converted into spoken words through a text-to-speech feature, empowering visually impaired individuals to comprehend their surroundings effectively. The implementation involves key technologies such as NumPy, OpenCV, pyttsx3, PyWin32, OpenCV-contribpython, and winsound, contributing to a comprehensive system for computer vision and audio processing. Results demonstrate successful execution, with the camera consistently detecting and labeling 5-6 objects in real time.

Full Text