Keyword retrieving in continuous speech using connectionist temporal classification

Dong Liu,Zhiyong Wang,Qirong Mao

doi:10.1007/s12652-020-01933-z

Dong Liu, Zhiyong Wang + Show 1 more

https://doi.org/10.1007/s12652-020-01933-z

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

The issue of Speech Keyword Retrieval (SKR) has received considerable critical attention. SKR aims to retrieve data from a speech repository given by a spoken query. The accuracy of retrieval often depends on the performance of acoustic model. In this paper, we proposed a new speech keyword retrieval framework called DCNN-CTC using Deep Convolutional Neural Network (DCNN) based on Connectionist Temporal Classification (CTC). The proposed method provides new insights into multimedia information retrieval. The pre-trained models are fine-tuned with a CTC loss to predict target keywords, and the features are extracted by DCNN, which is a complete end-to-end acoustic model training. It does not need to align and label the data one by one in advance, and CTC directly outputs the probability of sequence prediction, which greatly improves the processing performance of the speech retrieval system. Our experimental results on benchmark datasets show that our approach leads to stable and robust retrieval performance, and the precision rate and recall rate of DCNN-CTC are much higher than the baseline system.

Full Text