Abstract

The aim of the present study is to examine the relationship between word error rate (WER) from an automatic speech recognition system and perceptual judgments (foreign-accentedness, fluency, and comprehensibility) from human raters. In a previous study, Franco et al. (1997) used HMM-derived scores based on posterior probabilities of phone segments, and Deville et al. (1999) used an HMM/ANN recognition approach to show how the results of automatic speech recognition can be used for perceptual judgments. Park and Culnan (2019) showed the possibility to assimilate human raters' perceptual judgments by using neural network models only with the speech signal, and suggested that the model worked better on accentedness judgments than fluency judgments. In this study, we will examine whether WERs of English sentences produced by three language groups (American, Korean, Chinese) are significantly different, and if there is any difference, we will analyze the correlations between WER and perceptual judgments. The perceptual data used in Park and Culnan (2019) will be used for the analysis. The preliminary results of this study will be used to find important features to build more accurate automatic proficiency judgment models.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.