Abstract
Image processing and Speech synthesis is a complex and growing field of computer science and is being developed by many researchers. An image is a map of several pixels with values assigned according to what they are representing. These images are captured through various methods and for the common people it is easy to identify the characters in the images. For the visually impaired and illiterate people, identification of character in the image is impossible. Therefore, this paper presents a model for a system where the characters/text in the images can be read out loud by the system. The system is composed of two main subsystems: a model for Optical Character Recognition and a model for speech synthesis. The Optical Character Recognition model recognizes the characters/text in the images and converts into editable text through various process such as preprocessing, segmentation and classification. The editable text is fed as input to the speech synthesis model, which generates speech signal from the input text. Both the models are trained with machine learning and deep learning neural networks. The models have an accuracy of about 90% and the speech generate sounds similar to the speech used to train the model.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.