Abstract

Event Abstract Back to Event Application of automatic speech recognition (ASR) techniques for automatic speech assessment in people with aphasia Ying Qin1, Tan Lee1, Anthony Pak Hin Kong2* and Sampo Law3 1 The Chinese University of Hong Kong, Department of Electronic Engineering, Hong Kong, SAR China 2 University of Central Florida, Department of Communication Sciences and Disorders, United States 3 University of Hong Kong, Division of Speech and Hearing Sciences, Hong Kong, SAR China Introduction Assessment of speech production is an important part of the comprehensive evaluation process for people with aphasia (PWA). Subjective assessment of speech and language abilities of PWA is challenging because it requires not only clinical knowledge about aphasia but also good understanding of relevant linguistic and cultural background of the concerned language(s). An effective and reliable approach to objective assessment of PWA speech is highly desirable. The present study investigates the application of state-of-the-art automatic speech recognition (ASR) technology to quantifying the linguistic and acoustic characteristics of PWA speech and then the development of an automatic assessment system. Methods Speech recordings of 118 unimpaired participants and 82 PWA from the Cantonese AphasiaBank (Kong & Law, 2018) were extracted. Each participant provided spoken narratives elicited through pre-defined discourse tasks. For each of the 82 PWA, the Aphasia Quotient (AQ) was obtained by the Cantonese version of Western Aphasia Battery (Yiu, 1992). They were divided into the high-AQ group (AQ > 90.0; n=35) and the low-AQ group (AQ < 90.0; n=47). A domain-matched ASR system trained by unimpaired speech in the Cantonese AphasiaBank was used to decode a speech utterance from PWA into a syllable sequence with time boundary information (i.e., begin and end time of all syllables). Automatic assessment was based on the acoustic and text features extracted from the ASR output. Specifically, supra-segmental duration features were computed from the time alignment to characterize the atypical prosody of PWA speech, such as frequent insertion of pauses or lower speaking rate with prolongation of syllables. Robust text features were designed by syllable-level embedding methods, which have been most commonly used in natural language processing. The text features have been found to allow distinguishing impaired from normal speech, even if the ASR accuracy is low. The proposed features were evaluated in two experiments: two-class classification experiment (High-AQ vs. Low-AQ) and automatic prediction of AQ. Three classification algorithms, namely binary decision tree (BDT; Safavian & Landgrebe , 1991), random forest (RF; Liaw & Wiener, 2002), and support vector machine (SVM; Suykens & Vandewalle, 1999), were applied in the binary classification task. Prediction of AQ was formulated as a regression problem. Two regression models were constructed with linear regression and random forest (RF), respectively. Spearman correlation coefficients between PWA’s original AQ and predicted AQ (AQp) were computed to measure the strength of association between these AQ values. Results and Discussion The best performance was an average F1 score of 0.930 with the RF classifier on combined features. The classification accuracies for the Low-AQ and High-AQ groups were 89.4% (42/47) and 88.6% (31/35), respectively. In the AQ prediction experiment, the RF regression model yielded the highest AQ-AQp correlation (0.839, p<0.001). For 41 (50%) of the 82 PWA subjects, the prediction errors were smaller than 5%, and for 61 of them, the prediction errors were smaller than 10%. The experimental results suggested that the text features were more effective than acoustic features both in terms of the classification accuracy and the prediction of AQ values. Figure 1 Acknowledgements This research is partially supported by a GRF project grant (Ref: 14204014) from Hong Kong Research Grants Council and by the Shenzhen Municipal Engineering Laboratory of Speech Rehabilitation Technology. The Cantonese AphasiaBank was supported by a grant funded by the National Institutes of Health (NIH-R01-DC010398). References Kong, A.P.H. & Law, S.P. (2018). Cantonese AphasiaBank: An annotated database of spoken discourse and co-verbal gestures by healthy and language-impaired native Cantonese speakers. Behavior Research Methods. doi: 10.3758/s13428-018-1043-6 Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R news, 2(3), 18-22. Safavian, S. R., & Landgrebe, D. (1991). A survey of decision tree classifier methodology. IEEE transactions on systems, man, and cybernetics, 21(3), 660-674. Suykens, J. A., & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural processing letters, 9(3), 293-300. Yiu, E. M. L. (1992). Linguistic assessment of Chinese-speaking aphasics: development of a Cantonese aphasia battery. Journal of Neurolinguistics, 7, 379-424. Keywords: automatic speech recognition, discourse, Aphasia, Cantonese AphasiaBank, assessment Conference: Academy of Aphasia 56th Annual Meeting, Montreal, Canada, 21 Oct - 23 Oct, 2018. Presentation Type: poster presentation Topic: Eligible for a student award Citation: Qin Y, Lee T, Kong A and Law S (2019). Application of automatic speech recognition (ASR) techniques for automatic speech assessment in people with aphasia . Conference Abstract: Academy of Aphasia 56th Annual Meeting. doi: 10.3389/conf.fnhum.2018.228.00082 Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters. The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated. Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed. For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions. Received: 29 Apr 2018; Published Online: 22 Jan 2019. * Correspondence: Prof. Anthony Pak Hin Kong, University of Central Florida, Department of Communication Sciences and Disorders, Orlando, FL, United States, akong@hku.hk Login Required This action requires you to be registered with Frontiers and logged in. To register or login click here. Abstract Info Abstract The Authors in Frontiers Ying Qin Tan Lee Anthony Pak Hin Kong Sampo Law Google Ying Qin Tan Lee Anthony Pak Hin Kong Sampo Law Google Scholar Ying Qin Tan Lee Anthony Pak Hin Kong Sampo Law PubMed Ying Qin Tan Lee Anthony Pak Hin Kong Sampo Law Related Article in Frontiers Google Scholar PubMed Abstract Close Back to top Javascript is disabled. Please enable Javascript in your browser settings in order to see all the content on this page.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call