Automated evaluation of non-native English pronunciation quality: combining knowledge- and data-driven features at multiple time scales

Matthew P Black,Rahul Gupta,Jangwon Kim,Wei Xia,Zisis Iason Skordilis,Shrikanth S Narayanan,Pavlos Papadopoulos,Panayiotis G Georgiou,Bo Xiao,Sandeep Nallan Chakravarthula,Maarten Van Segbroeck,Daniel Bone

doi:10.21437/interspeech.2015-182

Matthew P Black, Rahul Gupta + Show 10 more

Open Access

https://doi.org/10.21437/interspeech.2015-182

Copy DOI

Abstract

Automatically evaluating pronunciation quality of non-native speech has seen tremendous success in both research and commercial settings, with applications in L2 learning. In this paper, submitted for the INTERSPEECH 2015 Degree of Nativeness Sub-Challenge, this problem is posed under a challenging crosscorpora setting using speech data drawn from multiple speakers from a variety of language backgrounds (L1) reading different English sentences. Since the perception of non-nativeness is realized at the segmental and suprasegmental linguistic levels, we explore a number of acoustic cues at multiple time scales. We experiment with both data-driven and knowledge-inspired features that capture degree of nativeness from pauses in speech, speaking rate, rhythm/stress, and goodness of phone pronunciation. One promising finding is that highly accurate automated assessment can be attained using a small diverse set of intuitive and interpretable features. Performance is further boosted by smoothing scores across utterances from the same speaker; our best system significantly outperforms the challenge baseline.

Full Text