OCR and Speech Recognition System Using Machine Learning

Thubten Jamtsho,Rakesh Bhujel,Krishna Powdyel,Reshan Kumar Powrel,Kazuhiro Muramatsu

doi:10.1109/i-pact52855.2021.9697030

Abstract

Image processing and Speech synthesis is a complex and growing field of computer science and is being developed by many researchers. An image is a map of several pixels with values assigned according to what they are representing. These images are captured through various methods and for the common people it is easy to identify the characters in the images. For the visually impaired and illiterate people, identification of character in the image is impossible. Therefore, this paper presents a model for a system where the characters/text in the images can be read out loud by the system. The system is composed of two main subsystems: a model for Optical Character Recognition and a model for speech synthesis. The Optical Character Recognition model recognizes the characters/text in the images and converts into editable text through various process such as preprocessing, segmentation and classification. The editable text is fed as input to the speech synthesis model, which generates speech signal from the input text. Both the models are trained with machine learning and deep learning neural networks. The models have an accuracy of about 90% and the speech generate sounds similar to the speech used to train the model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

OCR and Speech Recognition System Using Machine Learning

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Optical Character Reader & Text To Speech Conversion using Correlations & Speech Synthesis
Dr Avinash Rai* ... Ms Shivani Sonker
International Journal of Innovative Technology and Exploring Engineering | VOL. 9
Dr Avinash Rai*, et. al.Dr Avinash Rai* ... Ms Shivani Sonker
30 Aug 2020
International Journal of Innovative Technology and Exploring Engineering | VOL. 9

DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis
Yuki Saito ... Shinnosuke Takamichi
-
Yuki Saito, et. al.Yuki Saito ... Shinnosuke Takamichi
20 Sep 2019
20 Sep 2019

Lightweight convolution-based Chinese Speech Synthesis Method
Ruotong Yang ... Jiajun Liu
-
Ruotong Yang, et. al.Ruotong Yang ... Jiajun Liu
20 May 2022
20 May 2022

Code-Switching Speech Synthesis Based on Self-Supervised Learning and Domain Adaptive Speaker Encoder
Yi-Xing Lin ... Phuong Thi Le
-
Yi-Xing Lin, et. al.Yi-Xing Lin ... Phuong Thi Le
04 Jun 2023
04 Jun 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

OCR and Speech Recognition System Using Machine Learning

Abstract

Talk to us

Similar Papers