Abstract

Event Abstract Back to Event Automatic Speech Recognition of Scripted Productions from PWAs Brian Macwhinney1*, Davida Fromm1, Margie Forbes1 and Florian Metze1 1 Carnegie Mellon University, United States Detailed evaluation of speech productions from persons with aphasia (PWA) and apraxia of speech (AoS) requires painstaking transcription work and analysis. Although systems like Praat can produce excellent results, they cannot provide PWAs with immediate feedback regarding the correctness of their productions. Such feedback is important not only for training, but also for interaction with conversational agents and other computerized facilities. Systems such as NativeAccent (carnegiespeech.com) provide online assessment of phonological productions for individual words, but these systems will not provide detailed data for researchers, and they are tuned to the needs of second language learners, rather than PWAs. To address this problem, the AphasiaBank Project (aphasia.talkbank.org) has made use of the SpeechKitchen methodology (speechkitchen.org). This system packages a wide variety of state-of-the-art speech recognition methods based on the Kaldi processing system (kaldi-asr.org), deep-learning algorithms, and bidirectional alignment. These methods are made available to end users in the form of a Vagrant virtual machine that includes a complete software configuration for specific research or practical applications. In our case, we have used SpeechKitchen to process the script-based productions of 35 persons with aphasia and AoS, contributed to AphasiaBank. These productions involve repetition of three different scripts, each with seven sentences. We only included productions from PWAs whose output could be understood to at least some degree by a human listener. Because we know the shape and sounds of the target words in the three scripts, the ASR task is markedly simplified. We have used two ASR approaches for these productions. The first relies on recognition limited to the words included in the script. This method produces an accuracy level or word error rate (WER) of .45 with nearly half of the words not being recognized. Despite the low accuracy rate, this method achieves a good level of diarization, which means that it produces the exact time values of the beginning and end of every word, thereby also displaying pause duration and word length. We have also used a second ASR method to align the productions to individual phonemes in the IPA alphabet, rather than directly to words. This second method provides highly accurate start and stop times at the phoneme level, yielding even more precise information for detailed theoretical analysis of aphasic speech productions. We then apply a second filter to align sequences of phonemes with words. Using this method, we obtained a WER of .15, which we consider acceptable for further work. These methods will allow us to develop automated online methods for evaluation and training of spoken language in aphasia and AoS. They can also greatly improve processing and analysis of data from common measures in which the target is known, such as confrontation naming tests, oral reading assessments, and repetition tasks. Furthermore, we can examine the detailed data produced by these systems to evaluate the success of training methods and to understand the problems that PWAs with different lesion types have producing fluent speech. Keywords: Aphasia, speech recognition, ASR, naming, Apraxias Conference: Academy of Aphasia 55th Annual Meeting , Baltimore, United States, 5 Nov - 7 Nov, 2017. Presentation Type: poster or oral Topic: General Submission Citation: Macwhinney B, Fromm D, Forbes M and Metze F (2019). Automatic Speech Recognition of Scripted Productions from PWAs. Conference Abstract: Academy of Aphasia 55th Annual Meeting . doi: 10.3389/conf.fnhum.2017.223.00039 Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters. The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated. Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed. For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions. Received: 26 Apr 2017; Published Online: 25 Jan 2019. * Correspondence: Prof. Brian Macwhinney, Carnegie Mellon University, Pittsburgh, United States, macw@cmu.edu Login Required This action requires you to be registered with Frontiers and logged in. To register or login click here. Abstract Info Abstract The Authors in Frontiers Brian Macwhinney Davida Fromm Margie Forbes Florian Metze Google Brian Macwhinney Davida Fromm Margie Forbes Florian Metze Google Scholar Brian Macwhinney Davida Fromm Margie Forbes Florian Metze PubMed Brian Macwhinney Davida Fromm Margie Forbes Florian Metze Related Article in Frontiers Google Scholar PubMed Abstract Close Back to top Javascript is disabled. Please enable Javascript in your browser settings in order to see all the content on this page.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call