Speech Transcription Research Articles

Introduction: Research related to the automatic detection of Alzheimer's disease (AD) is important, given the high prevalence of AD and the high cost of traditional diagnostic methods. Since AD significantly affects the content and acoustics of spontaneous speech, natural language processing, and machine learning provide promising techniques for reliably detecting AD. There has been a recent proliferation of classification models for AD, but these vary in the datasets used, model types and training and testing paradigms. In this study, we compare and contrast the performance of two common approaches for automatic AD detection from speech on the same, well-matched dataset, to determine the advantages of using domain knowledge vs. pre-trained transfer models.Methods: Audio recordings and corresponding manually-transcribed speech transcripts of a picture description task administered to 156 demographically matched older adults, 78 with Alzheimer's Disease (AD) and 78 cognitively intact (healthy) were classified using machine learning and natural language processing as “AD” or “non-AD.” The audio was acoustically-enhanced, and post-processed to improve quality of the speech recording as well control for variation caused by recording conditions. Two approaches were used for classification of these speech samples: (1) using domain knowledge: extracting an extensive set of clinically relevant linguistic and acoustic features derived from speech and transcripts based on prior literature, and (2) using transfer-learning and leveraging large pre-trained machine learning models: using transcript-representations that are automatically derived from state-of-the-art pre-trained language models, by fine-tuning Bidirectional Encoder Representations from Transformer (BERT)-based sequence classification models.Results: We compared the utility of speech transcript representations obtained from recent natural language processing models (i.e., BERT) to more clinically-interpretable language feature-based methods. Both the feature-based approaches and fine-tuned BERT models significantly outperformed the baseline linguistic model using a small set of linguistic features, demonstrating the importance of extensive linguistic information for detecting cognitive impairments relating to AD. We observed that fine-tuned BERT models numerically outperformed feature-based approaches on the AD detection task, but the difference was not statistically significant. Our main contribution is the observation that when tested on the same, demographically balanced dataset and tested on independent, unseen data, both domain knowledge and pretrained linguistic models have good predictive performance for detecting AD based on speech. It is notable that linguistic information alone is capable of achieving comparable, and even numerically better, performance than models including both acoustic and linguistic features here. We also try to shed light on the inner workings of the more black-box natural language processing model by performing an interpretability analysis, and find that attention weights reveal interesting patterns such as higher attribution to more important information content units in the picture description task, as well as pauses and filler words.Conclusion: This approach supports the value of well-performing machine learning and linguistically-focussed processing techniques to detect AD from speech and highlights the need to compare model performance on carefully balanced datasets, using consistent same training parameters and independent test datasets in order to determine the best performing predictive model.

This article contributes to the discourse on how contemporary computer and information technology may help in improving foreign language learning not only by supporting better and more flexible workflow and digitizing study materials but also through creating completely new use cases made possible by technological improvements in signal processing algorithms. We discuss an approach and propose a holistic solution to teaching the phonological phenomena which are crucial for correct pronunciation, such as the phonemes; the energy and duration of syllables and pauses, which construct the phrasal rhythm; and the tone movement within an utterance, i.e., the phrasal intonation. The working prototype of StudyIntonation Computer-Assisted Pronunciation Training (CAPT) system is a tool for mobile devices, which offers a set of tasks based on a “listen and repeat” approach and gives the audio-visual feedback in real time. The present work summarizes the efforts taken to enrich the current version of this CAPT tool with two new functions: the phonetic transcription and rhythmic patterns of model and learner speech. Both are designed on a base of a third-party automatic speech recognition (ASR) library Kaldi, which was incorporated inside StudyIntonation signal processing software core. We also examine the scope of automatic speech recognition applicability within the CAPT system workflow and evaluate the Levenstein distance between the transcription made by human experts and that obtained automatically in our code. We developed an algorithm of rhythm reconstruction using acoustic and language ASR models. It is also shown that even having sufficiently correct production of phonemes, the learners do not produce a correct phrasal rhythm and intonation, and therefore, the joint training of sounds, rhythm and intonation within a single learning environment is beneficial. To mitigate the recording imperfections voice activity detection (VAD) is applied to all the speech records processed. The try-outs showed that StudyIntonation can create transcriptions and process rhythmic patterns, but some specific problems with connected speech transcription were detected. The learners feedback in the sense of pronunciation assessment was also updated and a conventional mechanism based on dynamic time warping (DTW) was combined with cross-recurrence quantification analysis (CRQA) approach, which resulted in a better discriminating ability. The CRQA metrics combined with those of DTW were shown to add to the accuracy of learner performance estimation. The major implications for computer-assisted English pronunciation teaching are discussed.

Speech Transcription Research Articles

Related Topics

Articles published on Speech Transcription

Modeling Topics in User Dialog for Interactive Tablet Media

UNINTELLIGIBLE SPEECH: LISTENERS' AWARENESS TO INDONESIAN-ACCENTED SPEECH WITH PRONUNCIATION ERRORS

Feasibility of using an automated analysis of formulation effort in patients’ spoken seizure descriptions in the differential diagnosis of epileptic and nonepileptic seizures

Leadership Sentiment and Price Fluctuations

Context-Based Quotation Recommendation

Unconventional Labour: Environmental Justice and Working-class Ecology in the New South Wales Green Bans

Automatic speech recognition in neurodegenerative disease

Comparing Pre-trained and Feature-Based Models for Prediction of Alzheimer's Disease Based on Speech.

THE CONVERSATION IMPLICATURE IN PRESIDENT JOKO WIDODO RHETORICAL AND DIPLOMATIC SPEECH

Boys-Specific Text-Comprehension Enhancement With Dual Visual-Auditory Text Presentation Among 12-14 Years-Old Students.

Animal listening

“If the gas pipeline would be built, we lose”: transcript of Reagan's speech at the US National Security Council Meeting on the Sanctions against Soviet Union

Attention reinforces human corticofugal system to aid speech perception in noise

Recognition of Alzheimer’s Dementia From the Transcriptions of Spontaneous Speech Using fastText and CNN Models

Paralinguistic and linguistic fluency features for Alzheimer's disease detection

Extending automatic transcripts in a unified data representation towards a prosodic-based metadata annotation and evaluation

Speech Processing for Language Learning: A Practical Approach to Computer-Assisted Pronunciation Teaching

Speech to multi-language text conversion

Жить и умереть Вступительное слово академика И.Т. Фролова на открытии конференции, посвященной проблемам жизни и смерти (Москва, 1993 г.) Предисловие к публикации

Convolutional recurrent neural network with attention for Vietnamese speech to text problem in the operating room

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Speech Transcription Research Articles

Related Topics

Articles published on Speech Transcription

Modeling Topics in User Dialog for Interactive Tablet Media

UNINTELLIGIBLE SPEECH: LISTENERS' AWARENESS TO INDONESIAN-ACCENTED SPEECH WITH PRONUNCIATION ERRORS

Feasibility of using an automated analysis of formulation effort in patients’ spoken seizure descriptions in the differential diagnosis of epileptic and nonepileptic seizures

Leadership Sentiment and Price Fluctuations

Context-Based Quotation Recommendation

Unconventional Labour: Environmental Justice and Working-class Ecology in the New South Wales Green Bans

Automatic speech recognition in neurodegenerative disease

Comparing Pre-trained and Feature-Based Models for Prediction of Alzheimer's Disease Based on Speech.

THE CONVERSATION IMPLICATURE IN PRESIDENT JOKO WIDODO RHETORICAL AND DIPLOMATIC SPEECH

Boys-Specific Text-Comprehension Enhancement With Dual Visual-Auditory Text Presentation Among 12-14 Years-Old Students.

Animal listening

“If the gas pipeline would be built, we lose”: transcript of Reagan's speech at the US National Security Council Meeting on the Sanctions against Soviet Union

Attention reinforces human corticofugal system to aid speech perception in noise

Recognition of Alzheimer’s Dementia From the Transcriptions of Spontaneous Speech Using fastText and CNN Models

Paralinguistic and linguistic fluency features for Alzheimer's disease detection

Extending automatic transcripts in a unified data representation towards a prosodic-based metadata annotation and evaluation

Speech Processing for Language Learning: A Practical Approach to Computer-Assisted Pronunciation Teaching

Speech to multi-language text conversion

Жить и умереть Вступительное слово академика И.Т. Фролова на открытии конференции, посвященной проблемам жизни и смерти (Москва, 1993 г.) Предисловие к публикации

Convolutional recurrent neural network with attention for Vietnamese speech to text problem in the operating room