Generation of Correct Word Sequences from Multiple Outputs of a Conventional Automatic Speech Recognizer for Voice-Activated Information Appliances

Harksoo Kim,Geonwoo Park

doi:10.1109/icce.2019.8662096

Abstract

In information appliances based on speech recognition, users’ spoken queries are converted into text queries using automatic speech recognition (ASR) engines. If the top-1 results of the ASR engines are incorrect, these errors are propagated to the following natural language processing steps. To alleviate this error propagation problem, we propose a post-processing model for revising ASR errors. Based on a sequence-to-sequence neural network, the proposed model generates a correct sentence from multiple candidate sentences returned by an ASR engine. The proposed model does not require any external resources or feature engineering effort, because it uses only syllables as input features. In our experiments with a Korean spoken chatting and FAQ corpus, the proposed model outperformed the previous models.

Full Text