Visual World to an Audible Experience: Visual Assistance for the Blind And Visually Impaired

S Shiv Mohith,S Vijay,Niranjana Krupa,Sanjana V

doi:10.1109/indicon49873.2020.9342481

Abstract

This paper aims at assisting visually impaired people through Deep Learning (DL) by providing a system that can describe the surroundings as well as answer questions about the surroundings of the user. The system majorly consists of two models, an Image Captioning (IC) model, and a Visual Question Answering (VQA) model. The IC model is a Convolutional Neural Network and Recurrent Neural Network based architecture that incorporates a form of attention while captioning. This paper proposes two models, Multi-Layer Perceptron based and Long Short Term Memory (LSTM) based, for the VQA task that answer questions related to the input image. The IC model has achieved an average BLUE 1 score of 0.46. The LSTM based VQA model has given an overall accuracy of 47 percent. These two models are integrated along with Speech to Text and Text to Speech components to form a single system that works in real time.

Full Text