Query-by-Example Speech Search Using Recurrent Neural Acoustic Word Embeddings With Temporal Context

Yougen Yuan,Bin Ma,Hongjie Chen,Cheung-Chi Leung,Lei Xie

doi:10.1109/access.2019.2918638

Yougen Yuan, Bin Ma + Show 3 more

Open Access

https://doi.org/10.1109/access.2019.2918638

Copy DOI

Abstract

Acoustic word embeddings (AWEs) have been popular in low-resource query-by-example speech search. They are using vector distances to find the spoken query in search content, which has much lower computation than the conventional dynamic time warping (DTW)-based approaches. The AWE networks are usually trained using variable-length isolated spoken words, while they are applied to fixed-length speech segments obtained by shifting an analysis window on speech content. There is an obvious mismatch between the learning of AWEs and its application on search content. To mitigate such mismatch, we propose to include temporal context information on spoken word pairs to learn recurrent neural AWEs. More specifically, the spoken word pairs are represented by multi-lingual bottleneck features (BNFs) and padded with the neighboring frames of the target spoken words to form fixed-length speech segment pairs. A deep bidirectional long short-term memory (BLSTM) network is then trained with a triplet loss using the speech segment pairs. Recurrent neural AWEs are obtained by concatenating the BLSTM backward and forward outputs. During QbE speech search stage, both spoken query and search content are converted into recurrent neural AWEs. Cosine distances are then measured between them to find the spoken query. The experiments show that using temporal context is essential to alleviate the mismatch. The proposed recurrent neural AWEs trained with temporal context outperform the previous state-of-art features with 12.5% relative mean average precision (MAP) improvement on QbE speech search.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 12	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Query-by-Example Speech Search Using Recurrent Neural Acoustic Word Embeddings With Temporal Context

Abstract

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search
Yougen Yuan ... Haizhou Li
-
Yougen Yuan, et. al.Yougen Yuan ... Haizhou Li
02 Sep 2018
02 Sep 2018

Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings
Shane Settle ... Keith Levin
-
Shane Settle, et. al.Shane Settle ... Keith Levin
20 Aug 2017
20 Aug 2017

Fast Query-by-example Speech Search using Attention-based Deep Binary Embeddings
Yougen Yuan ... Cheung-Chi Leung
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 28
Yougen Yuan, et. al.Yougen Yuan ... Cheung-Chi Leung
01 Jan 2020
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 28

Verifying Deep Keyword Spotting Detection with Acoustic Word Embeddings
Yougen Yuan ... Lei Xie
-
Yougen Yuan, et. al.Yougen Yuan ... Lei Xie
01 Dec 2019
01 Dec 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Query-by-Example Speech Search Using Recurrent Neural Acoustic Word Embeddings With Temporal Context

Abstract

Talk to us

Similar Papers

More From: IEEE Access