Automatic Fluency Assessment Method for Spontaneous Speech without Reference Text

Jiajun Liu,Cong Fan,Aishan Wumaier,Shen Guo

doi:10.3390/electronics12081775

Jiajun Liu, Cong Fan + Show 2 more

Open Access

PDF Available

https://doi.org/10.3390/electronics12081775

Copy DOI

Export

Save

Cite

Journal: Electronics	Publication Date: Apr 9, 2023
Citations: 3	License type: CC BY 4.0

Affiliation: Xinjiang University

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

The automatic fluency assessment of spontaneous speech without reference text is a challenging task that heavily depends on the accuracy of automatic speech recognition (ASR). Considering this scenario, it is necessary to explore an assessment method that combines ASR. This is mainly due to the fact that in addition to acoustic features being essential for assessment, the text features output by ASR may also contain potentially fluency information. However, most existing studies on automatic fluency assessment of spontaneous speech are based solely on audio features, without utilizing textual information, which may lead to a limited understanding of fluency features. To address this, we propose a multimodal automatic speech fluency assessment method that combines ASR output. Specifically, we first explore the relevance of the fluency assessment task to the ASR task and fine-tune the Wav2Vec2.0 model using multi-task learning to jointly optimize the ASR task and fluency assessment task, resulting in both the fluency assessment results and the ASR output. Then, the text features and audio features obtained from the fine-tuned model are fed into the multimodal fluency assessment model, using attention mechanisms to obtain more reliable assessment results. Finally, experiments on the PSCPSF and Speechocean762 dataset suggest that our proposed method performs well in different assessment scenarios.

Full Text