Abstract
This study explores the use of Bilingual speech recognition system for Indian languages along with Kaldi toolkit and Directed Acyclic Word Graphs as an innovative idea. When using a Large Vocabulary Continuous Speech Recognition (LVCSR), the crucial task is to establish a relationship between sub-word acoustic units across the particular languages. It forms the core for building automatic speech recognition system for multiple languages. Deep neural networks were employed with 2 hidden layers for acoustic modeling to create a Bidirectional Long Short-Term Memory Networks (BLSTM) model. For bilingual speech recognition, standard Mel-Frequency Cepstral Coefficients (MFCC) generated on audio along with Gaussian Mixture Model/Hidden Markov Model (GMM/HMM) were used to align the reference text. The final language models implemented were statistically pruned trigram models. The study aimed at building a refined Telugu-English bilingual speech recognition system by using a Directed Acyclic Word Graph (DAWG) to map phonetically similar words in both the languages.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.