Shefce: A Cantonese-English bilingual speech corpus for pronunciation assessment

Raymond W M Ng,Thomas Hain,Alvin C.M Kwan,Tan Lee

doi:10.1109/icassp.2017.7953273

Abstract

This paper introduces the development of ShefCE: a Cantonese-English bilingual speech corpus from L2 English speakers in Hong Kong. Bilingual parallel recording materials were chosen from TED online lectures. Script selection were carried out according to bilingual consistency (evaluated using a machine translation system) and the distribution balance of phonemes. 31 undergraduate to postgraduate students in Hong Kong aged 20–30 were recruited and recorded a 25-hour speech corpus (12 hours in Cantonese and 13 hours in English). Baseline phoneme/syllable recognition systems were trained on background data with and without the ShefCE training data. The final syllable error rate (SER) for Cantonese is 17.3% and final phoneme error rate (PER) for English is 34.5%. The automatic speech recognition performance on English showed a significant mismatch when applying L1 models on L2 data, suggesting the need for explicit accent adaptation. ShefCE and the corresponding baseline models will be made openly available for academic research.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Shefce: A Cantonese-English bilingual speech corpus for pronunciation assessment

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Incorporating syllabification points into a model of grapheme-to-phoneme conversion
Suyanto Suyanto
International Journal of Speech Technology | VOL. 22
Suyanto SuyantoSuyanto Suyanto
06 May 2019
International Journal of Speech Technology | VOL. 22

IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTIONAL NEURAL NETWORK PARAMETERS

Zenodo (CERN European Organization for Nuclear Research) | VOL. -

01 Dec 2018
Zenodo (CERN European Organization for Nuclear Research) | VOL. -

Improving Myanmar Automatic Speech Recognition with Optimization of Convolutional Neural Network Parameters
Aye Nyein Mon ... Ye Kyaw Thu
International Journal on Natural Language Computing | VOL. 7
Aye Nyein Mon, et. al.Aye Nyein Mon ... Ye Kyaw Thu
31 Dec 2019
International Journal on Natural Language Computing | VOL. 7

Causal analysis of Speech Recognition failure in adverse environments
Guojun Zhou ... Sangita Sharma
-
Guojun Zhou, et. al.Guojun Zhou ... Sangita Sharma
01 May 2002
01 May 2002

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Shefce: A Cantonese-English bilingual speech corpus for pronunciation assessment

Abstract

Talk to us

Similar Papers