Weighted finite-state transducer-based dysarthric speech recognition error correction using context-dependent pronunciation variation modelling

Woo Kyeong Seong,Ji Hun Park

doi:10.1504/ijesms.2014.058418

Abstract

In this paper, an error correction method is proposed to improve the performance of dysarthric automatic speech recognition (ASR). To this end, context-dependent pronunciation variations are modelled by using a weighted Kullback-Leibler (KL) distance between acoustic models of ASR. Then, the context-dependent pronunciation variation model is converted into a weighted finite-state transducer (WFST) and combined with a lexicon and a language model. It is shown from ASR experiments that average word error rate (WER) of a WFST-based ASR system with the proposed error correction method is relatively reduced by 19.73%, compared to an ASR system without error correction. Moreover, it is shown that the error correction method using a weighted KL distance relatively reduces average WER by 3.81%, compared to that using a KL distance.

Full Text