Abstract
In the literature, the task of dysarthric speech intelligibility assessment has been approached through development of different low-level feature representations, subspace modeling, phone confidence estimation or measurement of automatic speech recognition system accuracy. This paper proposes a novel approach where the intelligibility is estimated as the percentage of correct words uttered by a speaker with dysarthria by matching and verifying utterances of the speaker with dysarthria against control speakers’ utterances in phone posterior feature space and broad phonetic posterior feature space. Experimental validation of the proposed approach on the UA-Speech database, with posterior feature estimators trained on the data from auxiliary domain and language, obtained a best Pearson's correlation coefficient ( $r$ ) of 0.950 and Spearman's correlation coefficient ( $\rho$ ) of 0.957. Furthermore, replacing control speakers’ speech with speech synthesized by a neural text-to-speech system obtained a best $r$ of 0.931 and $\rho$ of 0.961.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.