Speech Duration Research Articles

AbstractThis descriptive study focuses on using voice activity detection (VAD) algorithms to extract student speech data in order to better understand the collaboration of small group work and the impact of teaching assistant (TA) interventions in undergraduate engineering discussion sections. Audio data were recorded from individual students wearing head‐mounted noise‐cancelling microphones. Video data of each student group were manually coded for collaborative behaviours (eg, group task relatedness, group verbal interaction and group talk content) of students and TA–student interactions. The analysis includes information about the turn taking, overall speech duration patterns and amounts of overlapping speech observed both when TAs were intervening with groups and when they were not. We found that TAs very rarely provided explicit support regarding collaboration. Key speech metrics, such as amount of turn overlap and maximum turn duration, revealed important information about the nature of student small group discussions and TA interventions. TA interactions during small group collaboration are complex and require nuanced treatments when considering the design of supportive tools. Practitioner notesWhat is already known about this topic Student turn taking can provide information about the nature of student discussions and collaboration. Real classroom audio data of small groups typically have lots of background noise and present challenges for audio analysis. TAs have little training in how to productively intervene with students about collaborative skills. What this paper adds TA interaction with groups primarily focused on task progress and understanding of concepts with negligible explicit support on building collaborative skills. TAs intervened with the groups often which gave the students little time for uptake of their suggestions or deeper discussion. Student turn overlap was higher without the presence of TAs. Maximum turn duration can be an important real‐time turn metric to identify the least verbally active student participant in a group. Implications for practice and/or policy TA training should include information about how to monitor groups for collaborative behaviours and when and how they should intervene to provide feedback and support. TA feedback systems should keep track of previous interventions by TAs (especially in contexts where there are multiple TAs facilitating) and the duration since previous intervention to ensure that TAs do not intervene with a group too frequently with little time for student uptake. Maximum turn duration could be used as a real‐time metric to identify the least verbally active student in a group so that support could be provided to them by the TAs.

Read full abstract

The work presented in this paper aims at enhancing the performance of end-to-end (E2E) speech recognition task for children's speech under low resource conditions. For majority of the languages, there is hardly any speech data from child speakers. Furthermore, even the available children's speech corpora are limited in terms of the number of hours of data. On the other hand, large amounts of adults' speech data are freely available for research as well as commercial purposes. As a consequence, developing an effective E2E automatic speech recognition (ASR) system for children becomes a very challenging task. One may develop an ASR system using adults' speech and then use it to transcribe children's data, but this leads to very poor recognition rates due to the stark differences in the acoustic attributes of adults' and children's speech. In order to overcome these hurdles and to develop a robust children's ASR system employing E2E architecture, we have resorted to several out-of-domain and in-domain data augmentation techniques. For out-of-domain data augmentation, we have explicitly modified adults' speech to render it acoustically similar to that of children's speech before pooling into training. On the other hand, in the case of in-domain data augmentation, we have slightly modified the pitch and duration of children's speech in order to create more data capturing greater diversity. Data augmentation approaches helps in mitigating the ill-effects resulting from the scarcity of data from child domain to a certain extent. This, in turn, reduces the error rates by a large margin. In addition to data augmentation, we have also studied the efficacy of Gamma-tone frequency cepstral coefficients (GFCC) and frequency domain linear prediction (FDLP) technique along with the most commonly used Mel-frequency cepstral coefficients (MFCC) for front-end speech parameterization. Both MFCC as well as GFCC capture and model the spectral envelope of speech. On the other hand, application of linear prediction on the frequency domain representation of speech signal helps to effectively capture the temporal envelope during front-end feature extraction. Employing FDLP features that model the temporal envelope provides important cues for the perception and understanding of stop bursts and, at times, complete phonemes. This motivated us to perform a comparative experimental study of the effectiveness of the three aforementioned front-end acoustic features. In our experimental explorations, the use of proposed data augmentation in combination of FDLP features has shown a relative improvement in character error rate by 67.6% over the baseline system. The combination of data augmentation with MFCC or GFCC features is observed to result in lower recognition performances.

Read full abstract

Speech Duration Research Articles

Related Topics

Articles published on Speech Duration

Enhancing analysis of diadochokinetic speech using deep neural networks

250. CHANGES IN AND CLINICAL UTILITY OF MAXIMUM PHONATION TIME AND REPETITIVE SALIVA SWALLOWING TEST SCORES AFTER ESOPHAGECTOMY

Iterative alignment discovery of speech-associated neural activity.

Natural Language Processing Applied to Spontaneous Recall of Famous Faces Reveals Memory Dysfunction in Temporal Lobe Epilepsy Patients.

Prosody in narratives: An exploratory study with children with sex chromosomes trisomies

Enhancing English Pronunciation Assessment in Computer-Assisted Language Learning for College Students

The Role of Prosody and Intonation in English Phonology: Implications for Speech Perception and Production

Auditory Event-related Potentials for Word Stimuli in Kannada Language Among Native Kannada Speakers with Dementia

Automatic speaking valve in tracheo-esophageal speech: treatment proposal for a widespread usage.

Whisper40: A Multi-Person Chinese Whisper Speaker Recognition Dataset Containing Same-Text Neutral Speech

Linguistic changes in neurodegenerative diseases relate to clinical symptoms.

Secondary language impairment in posterior cortical atrophy: insights from sentence repetition.

Speech Fluency Production and Perception in L1 (Slovak) and L2 (English) Read Speech.

Preceding word information for predicting speech errors in English as foreign language speech

Speech analysis of teaching assistant interventions in small group collaborative problem solving with undergraduate engineering students

Speech detection models for effective communicable disease risk assessment in air travel environments

Developing children's ASR system under low-resource conditions using end-to-end architecture

Comparing Machine Learning Models to Determine the Effect of Speech Duration on Speaker Identification within Kazakh Speech Corpus

BEYOND WORDS: HARNESSING SPEECH SOUND FOR SPEAKER AGE AND GENDER DETECTION USING 1D CNN ARCHITECTURE WITH SELF-ATTENTION MECHANISM

Virtual Reality on Public Speaking Phobia mitigation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Speech Duration Research Articles

Related Topics

Articles published on Speech Duration

Enhancing analysis of diadochokinetic speech using deep neural networks

250. CHANGES IN AND CLINICAL UTILITY OF MAXIMUM PHONATION TIME AND REPETITIVE SALIVA SWALLOWING TEST SCORES AFTER ESOPHAGECTOMY

Iterative alignment discovery of speech-associated neural activity.

Natural Language Processing Applied to Spontaneous Recall of Famous Faces Reveals Memory Dysfunction in Temporal Lobe Epilepsy Patients.

Prosody in narratives: An exploratory study with children with sex chromosomes trisomies

Enhancing English Pronunciation Assessment in Computer-Assisted Language Learning for College Students

The Role of Prosody and Intonation in English Phonology: Implications for Speech Perception and Production

Auditory Event-related Potentials for Word Stimuli in Kannada Language Among Native Kannada Speakers with Dementia

Automatic speaking valve in tracheo-esophageal speech: treatment proposal for a widespread usage.

Whisper40: A Multi-Person Chinese Whisper Speaker Recognition Dataset Containing Same-Text Neutral Speech

Linguistic changes in neurodegenerative diseases relate to clinical symptoms.

Secondary language impairment in posterior cortical atrophy: insights from sentence repetition.

Speech Fluency Production and Perception in L1 (Slovak) and L2 (English) Read Speech.

Preceding word information for predicting speech errors in English as foreign language speech

Speech analysis of teaching assistant interventions in small group collaborative problem solving with undergraduate engineering students

Speech detection models for effective communicable disease risk assessment in air travel environments

Developing children's ASR system under low-resource conditions using end-to-end architecture

Comparing Machine Learning Models to Determine the Effect of Speech Duration on Speaker Identification within Kazakh Speech Corpus

BEYOND WORDS: HARNESSING SPEECH SOUND FOR SPEAKER AGE AND GENDER DETECTION USING 1D CNN ARCHITECTURE WITH SELF-ATTENTION MECHANISM

Virtual Reality on Public Speaking Phobia mitigation