Speech to multi-language text conversion

Dr M Upendra Kumar

doi:10.54660/.ijmrge.2021.2.6.349-363

Abstract

Right from the beginning of previous century, researchers have shown interest in areas like Automatic Speech Recognition, Image Processing and Natural Language Processing. The area of Automatic Speech Recognition (ASR) has received attention over the past five decades due to its application in both commercial and military. In the recent times this can be attributed to the advancements in Artificial Intelligence and Advanced Algorithms. ASR takes speech as input and converts it in to text. ASR is employed in electronic dictionaries, Customer Call Centers, Voice Dictation and Query based Information Systems, Speech Transcription, Avionics, Smart Houses and Access Systems and many more areas. ASR can also be used to interact with handicapped people. ASR enables human beings interact with computers using speech rather than using keyboards & mouse (Vimalaand Radha V., 2012). ASR aims to provide natural machine interface where in speech acts input to the machine. Generally, ASR is based on two tasks viz. Identification of Phoneme and Whole-Word Decoding. A relationship between speech signal and speech segment that has dissimilar physical or perceptual features usually termed as phones is established in two steps. The first step deals with dimensionality reduction and second step deals with the estimation of likelihood of each phoneme. In the dimensionality reduction phase, the volume of the speech signal is decreased by extracting the relevant information using task-specific knowledge. In the next phase, the system recognizes the word sequence using a discriminative program. Traditionally ASR systems preferred the Mel frequency Cepstral coefficients (MFCC) for the first phase and Discriminative techniques for the second phase. Over the years ASR systems have evolved from being an integration of multiple trained components to “end-to-end” Deep neural architectures that link speech to text directly. The proposed work implements an MLP with AdaBoost Classifier. The MLP will be used to extract discriminative features from the speech data. Later AdaBoost classifier will map these features to the relevant set of words.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Speech to multi-language text conversion

Abstract

Talk to us

Similar Papers

More From: International Journal of Multidisciplinary Research and Growth Evaluation

Lead the way for us

Similar Papers

Performance Analysis of various Front-end and Back End Amalgamations for Noise-robust DNN-based ASR
Mohit Dua ... Vinam Agrawal
Recent Advances in Computer Science and Communications | VOL. 14
Mohit Dua, et. al.Mohit Dua ... Vinam Agrawal
01 Dec 2021
Recent Advances in Computer Science and Communications | VOL. 14

Prosodic Feature-Based Discriminatively Trained Low Resource Speech Recognition System
Taniya Hasija ... Hashem Alyami
Sustainability | VOL. 14
Taniya Hasija, et. al.Taniya Hasija ... Hashem Alyami
06 Jan 2022
Sustainability | VOL. 14

Native Language Identification from Spoken Indian English
...
Trends in Electrical Engineering | VOL. 9
, et. al. ...
30 Oct 2019
Trends in Electrical Engineering | VOL. 9

Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling
G Thimmaraja Yadava ... H S Jayanna
International Journal of Speech Technology | VOL. 23
G Thimmaraja Yadava, et. al.G Thimmaraja Yadava ... H S Jayanna
22 Jan 2020
International Journal of Speech Technology | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Speech to multi-language text conversion

Abstract

Talk to us

Similar Papers

More From: International Journal of Multidisciplinary Research and Growth Evaluation