Abstract

The computer-assisted learning of spoken language is closely tied to automatic speech recognition (ASR) technology which, as is well known, is challenging with non-native speech. By focusing on specific phonological differences between the target and source languages of non-native speakers, pronunciation assessment can be made more reliable. The four-way contrast of Hindi stops, where voicing and aspiration are phonemic for each of five distinct places-of-articulation, are typically challenging for a learner from a different native language group. The improper production of the aspiration contrast is thus often the salient cue to non-native accents of spoken Hindi. In this work, acoustic–phonetic features, motivated by an understanding of the production of the aspirated plosives, are evaluated for the classification of plosives along the aspiration dimension. Several new acoustic measures are proposed for the reliable detection of the aspiration contrast in unvoiced and voiced plosives. The acoustic–phonetic features are shown to perform well in the two-way classification task, and also appear robust to cross-language transfer where statistical models trained on Marathi speech were tested on native Hindi utterances. In experiments on native and non-native utterances of Hindi words by Tamil-L1 speakers, the acoustic–phonetic features clearly separate the non-native speakers from native on pronunciation quality of aspirated plosives. The acoustic–phonetic features also outperformed an ASR system based on more generic spectral features in terms of phone-level feedback that was consistent with human judgement.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call