Automatic Pronunciation Assessment using Self-Supervised Speech Representation Learning

Eesung Kim,Hyeji Seo,Hoon Kim,Jae-Jin Jeon

doi:10.21437/interspeech.2022-10245

Abstract

Self-supervised learning (SSL) approaches such as wav2vec 2.0 and HuBERT models have shown promising results in various downstream tasks in the speech community. In particular, speech representations learned by SSL models have been shown to be effective for encoding various speech-related characteristics. In this context, we propose a novel automatic pronunciation assessment method based on SSL models. First, the proposed method fine-tunes the pre-trained SSL models with connectionist temporal classification to adapt the English pronunciation of English-as-a-second-language (ESL) learners in a data environment. Then, the layer-wise contextual representations are extracted from all across the transformer layers of the SSL models. Finally, the automatic pronunciation score is estimated using bidirectional long short-term memory with the layer-wise contextual representations and the corresponding text. We show that the proposed SSL model-based methods outperform the baselines, in terms of the Pearson correlation coefficient, on datasets of Korean ESL learner children and Speechocean762. Furthermore, we analyze how different representations of transformer layers in the SSL model affect the performance of the pronunciation assessment task.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatic Pronunciation Assessment using Self-Supervised Speech Representation Learning

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

3M: An Effective Multi-view, Multi-granularity, and Multi-aspect Modeling Approach to English Pronunciation Assessment
Fu-An Chao ... Yao-Ting Sung
-
Fu-An Chao, et. al.Fu-An Chao ... Yao-Ting Sung
07 Nov 2022
07 Nov 2022

A Novel Multi-Task Self-Supervised Representation Learning Paradigm
Yinggang Li ... Qi Zhang
Control theory & applications | VOL. -
Yinggang Li, et. al.Yinggang Li ... Qi Zhang
28 May 2021
Control theory & applications | VOL. -

Benchmarking Self-Supervised Contrastive Learning Methods for Image-Based Plant Phenotyping.
Franklin C Ogidi ... Ian Stavness
Plant phenomics (Washington, D.C.) | VOL. 5
Franklin C Ogidi, et. al.Franklin C Ogidi ... Ian Stavness
01 Jan 2023
Plant phenomics (Washington, D.C.) | VOL. 5

Joint Self-Supervised Image-Volume Representation Learning with Intra-inter Contrastive Clustering
Duy M H Nguyen ... Shadi Albarqouni
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 37
Duy M H Nguyen, et. al.Duy M H Nguyen ... Shadi Albarqouni
26 Jun 2023
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 37

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic Pronunciation Assessment using Self-Supervised Speech Representation Learning

Abstract

Talk to us

Similar Papers