Integration of multilayer regression analysis with structure-based pronunciation assessment

Masayuki Suzuki,Nobuaki Minematsu,Keikichi Hirose,Yu Qiao

doi:10.21437/interspeech.2010-229

Abstract

Abstract Automatic pronunciation assessment has several difﬁculties.Adequacy in controlling the vocal organs is often estimatedfrom the spectral envelopes of input utterances but the envelopepatterns are also affected by other factors such as speaker iden-tity. Recently, a new method of speech representation was pro-posed where these non-linguistic variations are effectively re-moved through modeling only the contrastive aspects of speechfeatures. This speech representation is called speech struc-ture. However, the often excessively high dimensionality ofthe speech structure can degrade the performance of structure-based pronunciation assessment. To deal with this problem, weintegratemultilayerregressionanalysiswiththestructure-basedassessment. The results show higher correlation between hu-man and machine scores and also show much higher robustnessto speaker differences compared to widely used GOP-basedanalysis.Index Terms: CALL, speech structure, regression, GOP 1. Introduction Automatic pronunciation assessment is a task used to evalu-ate only the linguistic aspect of utterances. However, speechfeatures inevitably include acoustic variations caused by non-linguistic factors such as the speaker, communication chan-nel and noise. The same pronunciation can lead to differentacoustic observations due to different speakers and differentenvironments. To deal with these variations, modern pronun-ciation assessment approaches mainly make use of statisticalmethods to model the distributions of the acoustic features [1].These methods can achieve relatively high performance whenthere is a good match between training and testing conditions.Buttheirperformancealwaysdegradessigniﬁcantlywhentheseconditions are mismatched. In Automatic Speech Recogni-tion (ASR), speaker adaptation techniques have proved effec-tive at reducing mismatches. However, if the acoustic modelsused in pronunciation assessment are adapted to learners, in-correct pronunciations might be recognized as correct due toover-adaptation [2].To solve the mismatch problem, the third author of thispaper proposed a new speech representation, called speechstructure, which aims at removing the non-linguistic factorsin speech features [3]. In contrast to classical speech models,speech structures make use of

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Integration of multilayer regression analysis with structure-based pronunciation assessment

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

3M: An Effective Multi-view, Multi-granularity, and Multi-aspect Modeling Approach to English Pronunciation Assessment
Fu-An Chao ... Yao-Ting Sung
-
Fu-An Chao, et. al.Fu-An Chao ... Yao-Ting Sung
07 Nov 2022
07 Nov 2022

Attention-Based Multi-Encoder Automatic Pronunciation Assessment
Binghuai Lin ... Liyuan Wang
-
Binghuai Lin, et. al.Binghuai Lin ... Liyuan Wang
06 Jun 2021
06 Jun 2021

Transformer-Based Multi-Aspect Multi-Granularity Non-Native English Speaker Pronunciation Assessment
Yuan Gong ... Iek-Heng Chu
-
Yuan Gong, et. al.Yuan Gong ... Iek-Heng Chu
23 May 2022
23 May 2022

Automatic pronunciation assessment of Korean spoken by L2 learners using best feature set selection
Hyuksu Ryu ... Minhwa Chung
-
Hyuksu Ryu, et. al.Hyuksu Ryu ... Minhwa Chung
01 Dec 2016
01 Dec 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Integration of multilayer regression analysis with structure-based pronunciation assessment

Abstract

Talk to us

Similar Papers