Abstract

WHO has expressed that out of the general populace on the planet there are 253 million individuals are outwardly impeded around the world. It comes to the standpoint that visually impaired individuals are finding burdensome to curve out their ordinary life. It is vital for take significant measure with the current innovations so they can experience the ongoing scene with next to no troubles. To lift the visually impaired people in the public, this project has been proposed, which can identify images and translates the description of image into text and then produce the audio. This can assist the individual with perusing any text and recognize the image and get the result in vocal structure. Motivated by late work in machine interpretation also, object recognition, a CNN-RNN based attention model is presented in this project. Through the proposed framework, an image is converted into text description first; then, utilizing a basic text-to-speech API, the extracted caption/subtitle is converted into speech which further assists the visually impaired to understand the image or visuals they are looking at. So, the focal part is centered on building the subtitle/text model while the subsequent part, which is changing the text-to-speech, is moderately simple with the text-to-speech API. When the model is fabricated, it is deployed on the local framework utilizing a Flask-based model to produce audio-based caption for any image fed to the model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call