Abstract
The first goal of this study was to investigate the effect of changing several properties of a continuous speech recognizer (CSR) on the automatic phonetic transcriptions generated by the same CSR. Our results show that the quality of the automatic transcriptions can be improved by using ‘short’ hidden Markov models (HMMs) and by reducing the amount of contamination in the HMMs. The amount of contamination can be reduced by training the HMMs on the basis of a transcription that better matches the actual pronunciation, e.g., by modeling pronunciation variation or by training HMMs on read speech. Furthermore, we found that context-dependent HMMs should preferably not be trained on baseline transcriptions if there is a mismatch between these baseline transcriptions of the speech material and the realized pronunciation. Finally, we found that by combining the changes in the properties of the CSR, the quality of automatic transcription can be further improved. The second goal of this study was to find out whether a relationship exists between word error rate (WER) and transcription quality. As no clear relationship was found, we conclude that taking the CSR with the lowest WER does not necessarily provide the optimal solution for obtaining optimal automatic transcriptions.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.