Word Accuracy Rate Research Articles

The digitization of historical handwritten document images is important for the preservation of cultural heritage. Moreover, the transcription of text images obtained from digitization is necessary to provide efficient information access to the content of these documents. Handwritten Text Recognition (HTR) has become an important research topic in the areas of image and computational language processing that allows us to obtain transcriptions from text images. State-of-the-art HTR systems are, however, far from perfect. One difficulty is that they have to cope with image noise and handwriting variability. Another difficulty is the presence of a large amount of Out-Of-Vocabulary (OOV) words in ancient historical texts. A solution to this problem is to use external lexical resources, but such resources might be scarce or unavailable given the nature and the age of such documents. This work proposes a solution to avoid this limitation. It consists of associating a powerful optical recognition system that will cope with image noise and variability, with a language model based on sub-lexical units that will model OOV words. Such a language modeling approach reduces the size of the lexicon while increasing the lexicon coverage. Experiments are first conducted on the publicly available Rodrigo dataset, which contains the digitization of an ancient Spanish manuscript, with a recognizer based on Hidden Markov Models (HMMs). They show that sub-lexical units outperform word units in terms of Word Error Rate (WER), Character Error Rate (CER) and OOV word accuracy rate. This approach is then applied to deep net classifiers, namely Bi-directional Long-Short Term Memory (BLSTMs) and Convolutional Recurrent Neural Nets (CRNNs). Results show that CRNNs outperform HMMs and BLSTMs, reaching the lowest WER and CER for this image dataset and significantly improving OOV recognition.

Read full abstract

This paper proposes a frontend processing technique that employs a speech feature extraction method called Subband based Periodicity and Aperiodicity DEcomposition (SPADE), and examines its validity for automatic speech recognition in noisy environments. SPADE divides speech signals into subband signals, which are then decomposed into their periodic and aperiodic features, and uses both features as speech feature parameters. SPADE employs independent periodicity estimation within each subband and periodicity–aperiodicity decomposition design based on a parallel distributed processing technique motivated by the human speech perception process. Unlike other speech features, this decomposition of speech into two characteristics provides information about periodicities and aperiodicities, and thus allows the utilization of the robustness exhibited by periodic features without losing certain essential information included in aperiodic features. This paper first introduces an implementation of SPADE that operates in the frequency domain, and then examines the validity of combining SPADE with speech enhancement methods. For this examination, we combine SPADE with noise compensation methods that operate in the frequency domain and cepstral normalization methods. In addition, we employ an energy parameter calculation method based on the SPADE framework. An evaluation with the AURORA-2J noisy continuous digit speech recognition database (Japanese AURORA-2) shows that SPADE combined with adaptive Wiener filtering, cepstral normalization, and the energy parameter achieves average word accuracy rates of 82.58% with clean training and 92.55% with multicondition training. These rates are higher than those achieved with ETSI WI008 advanced DSR frontend processing (77.98% and 91.01%, respectively) whose speech feature parameter is based on conventional Mel-frequency cepstral coefficients. By comparison with ETSI WI008 advanced DSR frontend, the proposed method reduces word error rates by 20.9% with clean training and 17.2% with multicondition training. These results confirmed that SPADE combined with noise reduction methods can increase robustness in the presence of noise.

Read full abstract

Word Accuracy Rate Research Articles

Articles published on Word Accuracy Rate

Machine Learning Methods for Automatic Silent Speech Recognition Using a Wearable Graphene Strain Gauge Sensor.

An Effective Conversion of Visemes to Words for High-Performance Automatic Lipreading.

Speech Performance among Healthy Malay Female Speakers during Dual Tasks and Sentence Complexity

Development of an on-Premise Indonesian Handwriting Recognition Backend System Using Open Source Deep Learning Solution For Mobile User

Evaluation of Intelligent Speech Technology in Epidemic Prevention: Take Iflytek Input Software in Chinese and Japanese Recognition as an Example

Performing predefined tasks using the human–robot interaction on speech recognition for an industrial robot

A modified cascaded neuro-computational model applied to recognition of connected spoken Japanese prefecture words

Transcription of Spanish Historical Handwritten Documents with Deep Neural Networks

An experimental framework for Arabic digits speech recognition in noisy environments

Sign Transition Modeling and a Scalable Solution to Continuous Sign Language Recognition for Real-World Applications

Towards Robust Indonesian Speech Recognition with Spontaneous-Speech Adapted Acoustic Models

Intelligibility Assessment and Speech Recognizer Word Accuracy Rate Prediction for Dysarthric Speakers in a Factor Analysis Subspace

Automatic assessment of expressive oral reading

ASR For Embedded Real Time Applications

Hardware–Software Codesign of Automatic Speech Recognition System for Embedded Real-Time Applications

Low Bit-Rate Speech Coding through Quantization of Mel-Frequency Cepstral Coefficients

Automated postlaryngectomy telephone test

ASR post-correction for spoken dialogue systems based on semantic, syntactic, lexical and contextual information

A feature extraction method using subband based periodicity and aperiodicity decomposition with noise robust frontend processing for automatic speech recognition

Improving Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics in Noisy Environments Using Multi-Template Models

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Word Accuracy Rate Research Articles

Articles published on Word Accuracy Rate

Machine Learning Methods for Automatic Silent Speech Recognition Using a Wearable Graphene Strain Gauge Sensor.

An Effective Conversion of Visemes to Words for High-Performance Automatic Lipreading.

Speech Performance among Healthy Malay Female Speakers during Dual Tasks and Sentence Complexity

Development of an on-Premise Indonesian Handwriting Recognition Backend System Using Open Source Deep Learning Solution For Mobile User

Evaluation of Intelligent Speech Technology in Epidemic Prevention: Take Iflytek Input Software in Chinese and Japanese Recognition as an Example

Performing predefined tasks using the human–robot interaction on speech recognition for an industrial robot

A modified cascaded neuro-computational model applied to recognition of connected spoken Japanese prefecture words

Transcription of Spanish Historical Handwritten Documents with Deep Neural Networks

An experimental framework for Arabic digits speech recognition in noisy environments

Sign Transition Modeling and a Scalable Solution to Continuous Sign Language Recognition for Real-World Applications

Towards Robust Indonesian Speech Recognition with Spontaneous-Speech Adapted Acoustic Models

Intelligibility Assessment and Speech Recognizer Word Accuracy Rate Prediction for Dysarthric Speakers in a Factor Analysis Subspace

Automatic assessment of expressive oral reading

ASR For Embedded Real Time Applications

Hardware–Software Codesign of Automatic Speech Recognition System for Embedded Real-Time Applications

Low Bit-Rate Speech Coding through Quantization of Mel-Frequency Cepstral Coefficients

Automated postlaryngectomy telephone test

ASR post-correction for spoken dialogue systems based on semantic, syntactic, lexical and contextual information

A feature extraction method using subband based periodicity and aperiodicity decomposition with noise robust frontend processing for automatic speech recognition

Improving Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics in Noisy Environments Using Multi-Template Models