Abstract

This paper looks at the feasibility of a Hidden Markov Model (HMM) based speech recognition system to serve as a Bangla transcription device for doctors, who will dictate the case history of patients. The experiments are performed using Hidden Markov Toolkit (HTK). The features used are the Mel Frequency Cepstral Coefficients (MFCC) of the audio signal, which 39 features. The audio data is collected from ten male speakers and the train-test split is 50–50. The system consists of a word parser program, followed by an isolated word recognizer. The word parser takes discretely spoken sentences and outputs word audios. Each word audio is inputted to the word recognizer and the output words are concatenated. Five experiments were repeated twice, due to some words performing poorly in the first run. So, in the second run, more training data was added for the low accuracy words. The final sentence recognition accuracy was 80% and for most words, the recognition accuracy is above 90%. In conclusion, HMM-based recognition systems are feasible for transcription devices.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.