Recent reports have investigated the use of automatic speech recognition (ASR) to analyze and score verbal responses in cognitive tests. ASR scoring is objective, permits the efficient computerized administration of verbal tests, and generates timestamps that enable the detailed temporal analysis of responses. However, ASR transcription accuracy varies by engine, task, and participant, and ASR can incorrectly score responses from participants with atypical speech patterns. Here we describe the speech-transcription pipeline of the California Cognitive Assessment Battery (CCAB), which incorporates consensus ASR (CASR) to produce more accurate transcripts than possible with any single ASR engine. We also developed a Transcript Review Tool (TRT) which facilitates the manual correction of mis-transcribed words in problem subjects. Figure 1 shows the CCAB speech transcription pipeline. Realtime ASR transcriptions are obtained along with the transcriptions of the digital recordings of responses using six cloud-based ASR engines (e.g., Google, etc.). Individual transcripts are then combined to produce a "consensus" transcript, and a transcription confidence measure based primarily on the agreement between ASR engines (Figure 2). If needed, "consensus" transcripts can be manually corrected using the Transcript Review Tool which enables the review of all words or just those words below a predefined CASR confidence threshold (Figure 3). ASR transcriptions were obtained from 442 healthy adults (mean age = 65.1 ±14.4) who each underwent three days of cognitive testing that included 25 verbal tests. In all, approximately 276 hours of speech were transcribed. Preliminary analyses show that CASR transcription accuracy surpassed 99% for tests with limited response sets (e.g., digit span, verbal list learning, face-name binding, etc.) and exceeded 95% for discursive speech tests (e.g., picture description and logical memory). CASR transcription is more accurate than that of any single ASR engine. When combined with the TRT, "consensus" ASR can produce error-free, timestamped transcripts that enable the detailed analysis of verbal responses from older individuals at risk of cognitive decline.
Read full abstract