Absolute Word Error Rate Research Articles

The importance of online handwriting recognition has been rapidly increasing over recent years due to the rapid technological advances in handheld devices and communication software with handwriting interfaces. Deep learning end-to-end (E2E) models have provided high recognition rates as part of online handwriting recognition systems. However, attaining even higher performance levels requires supplementing these models with adaptation techniques that cater to individual penmanship. This study proposes a writer adaptation technique for Arabic online handwriting recognition systems that employs adversarial Multi-Task Learning (MTL). Adversarial training and MTL modify the deep-features distribution of the Writer Dependent (WD) model, leading its output to closely resemble that of the Writer Independent (WI) model. The design of the proposed method entails two tasks: label classification (primary task) and model features discrimination (secondary task). Our method was designed to jointly optimize both sub-networks. The proposed technique was tested against the E2E Connectionist Temporal Classification (CTC) based model, a combination of both Convolutional Neural Networks (CNNs) and Bidirectional Long Short-term Memory (BiLSTM). The proposed models were trained and evaluated against two large datasets (the Online-KHATT and CHAW). In supervised adaptation, it achieved an absolute Character Error Rate (CER) of up to 1.83% and an absolute Word Error Rate (WER) reduction of 11.71% over the WI model. Additionally, supervised adaptation achieved an absolute CER of up to 0.84% and an absolute WER reduction of 6.77% over the fine-tuned model. In unsupervised adaptation, the proposed method achieved an absolute CER of up to 0.5% absolute and an absolute WER reduction of 1.74% absolute (WER) reduction over the WI. Our experimental results indicate that our proposed supervised writer adaptation can achieve significant improvements in recognition accuracy compared with the baseline models: WI and fine-tuned models.

Automatic speech recognition for child speech has been long considered a more challenging problem than for adult speech. Various contributing factors have been identified such as larger acoustic speech variability including mispronunciations due to continuing biological changes in growth, developing vocabulary and linguistic skills, and scarcity of training corpora. A further challenge arises when dealing with spontaneous speech of children involved in a conversational interaction, and especially when the child may have limited or impaired communication ability. This includes health applications, one of the motivating domains of this paper, that involve goal-oriented dyadic interactions between a child and clinician/adult social partner as a part of behavioral assessment. In this work, we use linguistic context information from the interaction to adapt speech recognition models for children speech. Specifically, spoken language from the interacting adult speech provides the context for the child’s speech. We propose two methods to exploit this context: lexical repetitions and semantic response generation. For the latter, we make use of sequence-to-sequence models that learn to predict the target child utterance given context adult utterances. Long-term context is incorporated in the model by propagating the cell-state across the duration of conversation. We use interpolation techniques to adapt language models at the utterance level, and analyze the effect of length and direction of context (forward and backward). Two different domains are used in our experiments to demonstrate the generalized nature of our methods - interactions between a child with ASD and an adult social partner in a play-based, naturalistic setting, and in forensic interviews between a child and a trained interviewer. In both cases, context-adapted models yield significant improvement (upto 10.71% in absolute word error rate) over the baseline and perform consistently across context windows and directions. Using statistical analysis, we investigate the effect of source-based (adult) and target-based (child) factors on adaptation methods. Our results demonstrate the applicability of our modeling approach in improving child speech recognition by employing information transfer from the adult interlocutor.

Absolute Word Error Rate Research Articles

Related Topics

Articles published on Absolute Word Error Rate

OPTIMIZE WAV2VEC2S ARCHITECTURE FOR SMALL TRAINING SET THROUGH ANALYZING ITS PRE-TRAINED MODELS ATTENTION PATTERN.

Low-resource automatic speech recognition and error analyses of oral cancer speech

Automatic Speech Recognition Performance Improvement for Mandarin Based on Optimizing Gain Control Strategy.

Writer adaptation for E2E Arabic online handwriting recognition via adversarial multi task learning

Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition

Leveraging Linguistic Context in Dyadic Interactions to Improve Automatic Speech Recognition for Children

Subspace Gaussian mixture based language modeling for large vocabulary continuous speech recognition

Semantic Features Based N-Best Rescoring Methods for Automatic Speech Recognition

Shared-hidden-layer Deep Neural Network for Under-resourced Language the Content

Multi-Channel Speech Enhancement and Amplitude Modulation Analysis for Noise Robust Automatic Speech Recognition

Regression-Based Context-Dependent Modeling of Deep Neural Networks for Speech Recognition

Using different acoustic, lexical and language modeling units for ASR of an under-resourced language – Amharic

Unsupervised Equalization of Lombard Effect for Speech Recognition in Noisy Adverse Environments

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Absolute Word Error Rate Research Articles

Related Topics

Articles published on Absolute Word Error Rate

OPTIMIZE WAV2VEC2S ARCHITECTURE FOR SMALL TRAINING SET THROUGH ANALYZING ITS PRE-TRAINED MODELS ATTENTION PATTERN.

Low-resource automatic speech recognition and error analyses of oral cancer speech

Automatic Speech Recognition Performance Improvement for Mandarin Based on Optimizing Gain Control Strategy.

Writer adaptation for E2E Arabic online handwriting recognition via adversarial multi task learning

Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition

Leveraging Linguistic Context in Dyadic Interactions to Improve Automatic Speech Recognition for Children

Subspace Gaussian mixture based language modeling for large vocabulary continuous speech recognition

Semantic Features Based N-Best Rescoring Methods for Automatic Speech Recognition

Shared-hidden-layer Deep Neural Network for Under-resourced Language the Content

Multi-Channel Speech Enhancement and Amplitude Modulation Analysis for Noise Robust Automatic Speech Recognition

Regression-Based Context-Dependent Modeling of Deep Neural Networks for Speech Recognition

Using different acoustic, lexical and language modeling units for ASR of an under-resourced language – Amharic

Unsupervised Equalization of Lombard Effect for Speech Recognition in Noisy Adverse Environments