Absolute Word Error Rate Reduction Research Articles

The importance of online handwriting recognition has been rapidly increasing over recent years due to the rapid technological advances in handheld devices and communication software with handwriting interfaces. Deep learning end-to-end (E2E) models have provided high recognition rates as part of online handwriting recognition systems. However, attaining even higher performance levels requires supplementing these models with adaptation techniques that cater to individual penmanship. This study proposes a writer adaptation technique for Arabic online handwriting recognition systems that employs adversarial Multi-Task Learning (MTL). Adversarial training and MTL modify the deep-features distribution of the Writer Dependent (WD) model, leading its output to closely resemble that of the Writer Independent (WI) model. The design of the proposed method entails two tasks: label classification (primary task) and model features discrimination (secondary task). Our method was designed to jointly optimize both sub-networks. The proposed technique was tested against the E2E Connectionist Temporal Classification (CTC) based model, a combination of both Convolutional Neural Networks (CNNs) and Bidirectional Long Short-term Memory (BiLSTM). The proposed models were trained and evaluated against two large datasets (the Online-KHATT and CHAW). In supervised adaptation, it achieved an absolute Character Error Rate (CER) of up to 1.83% and an absolute Word Error Rate (WER) reduction of 11.71% over the WI model. Additionally, supervised adaptation achieved an absolute CER of up to 0.84% and an absolute WER reduction of 6.77% over the fine-tuned model. In unsupervised adaptation, the proposed method achieved an absolute CER of up to 0.5% absolute and an absolute WER reduction of 1.74% absolute (WER) reduction over the WI. Our experimental results indicate that our proposed supervised writer adaptation can achieve significant improvements in recognition accuracy compared with the baseline models: WI and fine-tuned models.

Discriminative training techniques define state-of-the-art performance for automatic speech recognition systems. However, they are inherently prone to overfitting, leading to poor generalization performance when using limited training data. In order to address this issue, this paper presents a full Bayesian framework to account for model uncertainty in sequence discriminative training of factored TDNN acoustic models. Several Bayesian learning based TDNN variant systems are proposed to model the uncertainty over weight parameters and choices of hidden activation functions, or the hidden layer outputs. Efficient variational inference approaches using as few as one single parameter sample ensure their computational cost in both training and evaluation time comparable to that of the baseline TDNN systems. Statistically significant word error rate (WER) reductions of 0.4%-1.8% absolute (5%-11% relative) were obtained over a state-of-the-art 900 h speed perturbed Switchboard corpus trained baseline LF-MMI factored TDNN system using multiple regularization methods including F-smoothing, L2 norm penalty, natural gradient, model averaging and dropout, in addition to i-Vector plus learning hidden unit contribution (LHUC) based speaker adaptation and RNNLM rescoring. The efficacy of the proposed Bayesian techniques is further demonstrated in a comparison against the state-of-the-art performance obtained on the same task using the most recent hybrid and end-to-end systems reported in the literature. Consistent performance improvements were also obtained on a 450-h HKUST conversational Mandarin telephone speech recognition task. On a third cross domain adaptation task requiring rapidly porting a 1000-h LibriSpeech data trained system to a small DementiaBank elderly speech corpus, the proposed Bayesian TDNN LF-MMI systems outperformed the baseline system using direct weight fine-tuning by up to 2.5% absolute WER reduction.

Absolute Word Error Rate Reduction Research Articles

Articles published on Absolute Word Error Rate Reduction

Low-resource automatic speech recognition and error analyses of oral cancer speech

Automatic Speech Recognition Performance Improvement for Mandarin Based on Optimizing Gain Control Strategy.

Writer adaptation for E2E Arabic online handwriting recognition via adversarial multi task learning

Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition

Subspace Gaussian mixture based language modeling for large vocabulary continuous speech recognition

Regression-Based Context-Dependent Modeling of Deep Neural Networks for Speech Recognition

Using different acoustic, lexical and language modeling units for ASR of an under-resourced language – Amharic

Discriminative n-gram language modeling

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Absolute Word Error Rate Reduction Research Articles

Articles published on Absolute Word Error Rate Reduction

Low-resource automatic speech recognition and error analyses of oral cancer speech

Automatic Speech Recognition Performance Improvement for Mandarin Based on Optimizing Gain Control Strategy.

Writer adaptation for E2E Arabic online handwriting recognition via adversarial multi task learning

Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition

Subspace Gaussian mixture based language modeling for large vocabulary continuous speech recognition

Regression-Based Context-Dependent Modeling of Deep Neural Networks for Speech Recognition

Using different acoustic, lexical and language modeling units for ASR of an under-resourced language – Amharic

Discriminative n-gram language modeling