3M: An Effective Multi-view, Multi-granularity, and Multi-aspect Modeling Approach to English Pronunciation Assessment

Fu-An Chao,Berlin Chen,Tien-Hong Lo,Tzu-I Wu,Yao-Ting Sung

doi:10.23919/apsipaasc55919.2022.9979979

Abstract

As an indispensable ingredient of computer-assisted pronunciation training (CAPT), automatic pronunciation assessment (APA) plays a pivotal role in aiding self-directed language learners by providing multi-aspect and timely feedback. However, there are at least two potential obstacles that might hinder its performance for practical use. On one hand, most of the studies focus exclusively on leveraging segmental (phonetic)-level features such as goodness of pronunciation (GOP); this, however, may cause a discrepancy of feature granularity when performing suprasegmental (prosodic)-level pronunciation assessment. On the other hand, automatic pronunciation assessments still suffer from the lack of large-scale labeled speech data of non-native speakers, which inevitably limits the performance of pronunciation assessment. In this paper, we tackle these problems by integrating multiple prosodic and phonological features to provide a multi-view, multi-granularity, and multi-aspect (3M) pronunciation modeling. Specifically, we augment GOP with prosodic and self-supervised learning (SSL) features, and meanwhile develop a vowel/consonant positional embedding for a more phonology-aware automatic pronunciation assessment. A series of experiments conducted on the publicly-available speechocean762 dataset show that our approach can obtain significant improvements on several assessment granularities in comparison with previous work, especially on the assessment of speaking fluency and speech prosody.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

3M: An Effective Multi-view, Multi-granularity, and Multi-aspect Modeling Approach to English Pronunciation Assessment

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Gated fusion of handcrafted and deep features for robust automatic pronunciation assessment
Binghuai Lin ... Liyuan Wang
-
Binghuai Lin, et. al.Binghuai Lin ... Liyuan Wang
07 Nov 2022
07 Nov 2022

Automatic Pronunciation Assessment using Self-Supervised Speech Representation Learning
Eesung Kim ... Jae-Jin Jeon
-
Eesung Kim, et. al.Eesung Kim ... Jae-Jin Jeon
18 Sep 2022
18 Sep 2022

Attention-Based Multi-Encoder Automatic Pronunciation Assessment
Binghuai Lin ... Liyuan Wang
-
Binghuai Lin, et. al.Binghuai Lin ... Liyuan Wang
06 Jun 2021
06 Jun 2021

Integration of multilayer regression analysis with structure-based pronunciation assessment
Masayuki Suzuki ... Yu Qiao
-
Masayuki Suzuki, et. al.Masayuki Suzuki ... Yu Qiao
26 Sep 2010
26 Sep 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

3M: An Effective Multi-view, Multi-granularity, and Multi-aspect Modeling Approach to English Pronunciation Assessment

Abstract

Talk to us

Similar Papers