Automatic Speech Recognition Results Research Articles

This paper presents a novel method for improving the readability of automatic speech recognition (ASR) results for classroom lectures. Because speech in a classroom is spontaneous and contains many ill-formed utterances with various disfluencies, the ASR result should be edited to improve the readability before presenting it to users, by applying some operations such as removing disfluencies, determining sentence boundaries, inserting punctuation marks and repairing dropped words. Owing to the presence of many kinds of domain-dependent words and casual styles, even state-of-the-art recognizers can only achieve a 30-50% word error rate for speech in classroom lectures. Therefore, a method for improving the readability of ASR results is needed to make it robust to recognition errors. We can use multiple hypotheses instead of the single-best hypothesis as a method to achieve a robust response to recognition errors. However, if the multiple hypotheses are represented by a lattice (or a confusion network), it is difficult to utilize sentence-level knowledge, such as chunking and dependency parsing, which are imperative for determining the discourse structure and therefore imperative for improving readability. In this paper, we propose a novel algorithm that infers clean, readable transcripts from spontaneous multiple hypotheses represented by a confusion network while integrating sentence-level knowledge. Automatic and manual evaluations showed that using multiple hypotheses and sentence-level knowledge is effective to improve the readability of ASR results, while preserving the understandability.

Read full abstract

The performance of speech translation systems combining automatic speech recognition (ASR) and machine translation (MT) systems is degraded by redundant and irrelevant information caused by speaker disfluency and recognition errors. This paper proposes a new approach to translating speech recognition results through speech consolidation, which removes ASR errors and disfluencies and extracts meaningful phrases. A consolidation approach is spun off from speech summarization by word extraction from ASR 1-best. We extended the consolidation approach for confusion network (CN) and tested the performance using TED speech and confirmed the consolidation results preserved more meaningful phrases in comparison with the original ASR results. We applied the consolidation technique to speech translation. To test the performance of consolidation-based speech translation, Chinese broadcast news (BN) speech in RT04 were recognized, consolidated and then translated. The speech translation results via consolidation cannot be directly compared with gold standards in which all words in speech are translated because consolidation-based translations are partial translations. We would like to propose a new evaluation framework for partial translation by comparing them with the most similar set of words extracted from a word network created by merging gradual summarizations of the gold standard translation. The performance of consolidation-based MT results was evaluated using BLEU. We also propose Information Preservation Accuracy (IPAccy) and Meaning Preservation Accuracy (MPAccy) to evaluate consolidation and consolidation-based MT. We confirmed that consolidation contributed to the performance of speech translation.

Read full abstract

Automatic Speech Recognition Results Research Articles

Related Topics

Articles published on Automatic Speech Recognition Results

A Joint Approach for Single-Channel Speaker Identification and Speech Separation

Automatic speech recognition performance in different room acoustic environments with and without dereverberation preprocessing

Improving the Readability of ASR Results for Lectures Using Multiple Hypotheses and Sentence-Level Knowledge

OPTIMIZATION OF COST FUNCTION WEIGHTS FOR UNIT SELECTION SPEECH SYNTHESIS USING SPEECH RECOGNITION

Leveraging word confusion networks for named entity modeling and detection from conversational telephone speech

A novel framework for noise robust ASR using cochlear implant-like spectrally reduced speech

Evaluating Source Separation Algorithms With Reverberant Speech

Selecting Help Messages by Using Robust Grammar Verification for Handling Out-of-Grammar Utterances in Spoken Dialogue Systems

Named Entity Recognition from Speech Using Discriminative Models and Speech Recognition Confidence

Consolidation-Based Speech Translation and Evaluation Approach

Towards an efficient archive of spontaneous speech: Design of computer-assisted speech transcription system

A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets

Improvement of recognition of simultaneous speech signals using AV integration and scattering theory for humanoid robots

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Automatic Speech Recognition Results Research Articles

Related Topics

Articles published on Automatic Speech Recognition Results

A Joint Approach for Single-Channel Speaker Identification and Speech Separation

Automatic speech recognition performance in different room acoustic environments with and without dereverberation preprocessing

Improving the Readability of ASR Results for Lectures Using Multiple Hypotheses and Sentence-Level Knowledge

OPTIMIZATION OF COST FUNCTION WEIGHTS FOR UNIT SELECTION SPEECH SYNTHESIS USING SPEECH RECOGNITION

Leveraging word confusion networks for named entity modeling and detection from conversational telephone speech

A novel framework for noise robust ASR using cochlear implant-like spectrally reduced speech

Evaluating Source Separation Algorithms With Reverberant Speech

Selecting Help Messages by Using Robust Grammar Verification for Handling Out-of-Grammar Utterances in Spoken Dialogue Systems

Named Entity Recognition from Speech Using Discriminative Models and Speech Recognition Confidence

Consolidation-Based Speech Translation and Evaluation Approach

Towards an efficient archive of spontaneous speech: Design of computer-assisted speech transcription system

A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets

Improvement of recognition of simultaneous speech signals using AV integration and scattering theory for humanoid robots