A Lightweight Visual Understanding System for Enhanced Assistance to the Visually Impaired Using an Embedded Platform

Adel Jalal Yousif,Mohammed H Al-Jammas

doi:10.24237/djes.2024.17310

Adel Jalal Yousif, Mohammed H Al-Jammas

Open Access

https://doi.org/10.24237/djes.2024.17310

Copy DOI

Export

Save

Cite

Journal: Diyala Journal of Engineering Sciences	Publication Date: Sep 1, 2024
Citations: 1	License type: CC BY 4.0

Abstract
Full-Text
Similar Papers

Abstract

Listen

Visually impaired individuals often face significant challenges in navigating their environments due to limited access to visual information. To address this issue, a portable, cost-effective assistive tool is proposed to operate on a low-power embedded system such as the Jetson Nano. The novelty of this research lies in developing an efficient, lightweight video captioning model within constrained resources to ensure its compatibility with embedded platforms. This research aims to enhance the autonomy and accessibility of visually impaired people by providing audio descriptions of their surroundings through the processing of live-streaming videos. The proposed system utilizes two distinct lightweight deep learning modules: an object detection module based on the state-of-the-art YOLOv7 model, and a video captioning module that utilizes both the Video Swin Transformer and 2D-CNN for feature extraction, along with the Transformer network for caption generation. The goal of the object detection module is for providing real-time multiple object identification in the surrounding environment of the blind while the video captioning module is to provide detailed descriptions of the entire visual scenes and activities including objects, actions, and relationships between them. The user interacts via a headphone with the proposed system using a specific audio command to trigger the corresponding module even object detection or video captioning and receiving an audio description output for the visual contents. The system demonstrates satisfactory results, achieving inference speeds between 0.11 to 1.1 seconds for object detection and 0.91 to 1.85 seconds for video captioning, evaluated through both quantitative metrics and subjective assessments.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

A Lightweight Visual Understanding System for Enhanced Assistance to the Visually Impaired Using an Embedded Platform

Abstract

Published Version

Talk to us

Similar Papers

More From: Diyala Journal of Engineering Sciences

Lead the way for us

Similar Papers

Joint Object Detection and Multi-Object Tracking Based on Hypergraph Matching
Zhoujuan Cui ... Xiaoming Tao
Applied Sciences | VOL. 14
Zhoujuan Cui, et. al.Zhoujuan Cui ... Xiaoming Tao
28 Nov 2024
Applied Sciences | VOL. 14

Visualized Analysis of the Emerging Trends of Automated Audio Description Technology
Lingqian Zheng ... Xinrong Cao
-
Lingqian Zheng, et. al.Lingqian Zheng ... Xinrong Cao
01 Jan 2023
01 Jan 2023

Accuracy of a Cascade Network for Semi-Supervised Maxillary Sinus Detection and Sinus Cyst Classification.
Xueqi Guo ... Shiyong Zhao
Clinical implant dentistry and related research | VOL. 27
Xueqi Guo, et. al.Xueqi Guo ... Shiyong Zhao
01 Feb 2025
Clinical implant dentistry and related research | VOL. 27

Towards Unified Deep Learning Model for NSFW Image and Video Captioning
Jong-Won Ko ... Dong-Hyun Hwang
-
Jong-Won Ko, et. al.Jong-Won Ko ... Dong-Hyun Hwang
29 Nov 2018
29 Nov 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

A Lightweight Visual Understanding System for Enhanced Assistance to the Visually Impaired Using an Embedded Platform

Abstract

Published Version

Talk to us

Similar Papers

More From: Diyala Journal of Engineering Sciences