Fast Query-by-example Speech Search using Attention-based Deep Binary Embeddings

Yougen Yuan,Lei Xie,Cheung-Chi Leung,Bin Ma,Hongjie Chen

doi:10.1109/taslp.2020.2998277

Abstract

State-of-the-art query-by-example (QbE) speech search approaches usually use recurrent neural network (RNN) based acoustic word embeddings (AWEs) to represent variable-length speech segments with fixed-dimensional vectors, and thus simple cosine distances can be measured over the embedded vectors of both the spoken query and the search content. In this paper, we aim to improve search accuracy and speed for the AWE-based QbE approach in low-resource scenario. First, multi-head self-attentive mechanism is introduced for learning a sequence of attention weights for all time steps of RNN outputs while attending to different positions of a speech segment. Second, as the real-valued AWEs suffer from substantial computation in similarity measure, a hashing layer is adopted for learning deep binary embeddings, and thus binary pattern matching can be directly used for fast QbE speech search. The proposed approach of self-attentive deep hashing network is effectively trained with three specifically-designed objectives: a penalization term, a triplet loss, and a quantization loss. Experiments show that our approach improves the relative search speed by 8 times and mean average precision (MAP) by 18.9%, as compared with the previous best real-valued embedding approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Fast Query-by-example Speech Search using Attention-based Deep Binary Embeddings

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2020
Citations: 55

Similar Papers

Query-by-Example Speech Search Using Recurrent Neural Acoustic Word Embeddings With Temporal Context
Yougen Yuan ... Hongjie Chen
IEEE Access | VOL. 7
Yougen Yuan, et. al.Yougen Yuan ... Hongjie Chen
01 Jan 2019
IEEE Access | VOL. 7

Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search
Yougen Yuan ... Bin Ma
-
Yougen Yuan, et. al.Yougen Yuan ... Bin Ma
02 Sep 2018
02 Sep 2018

Acoustic Span Embeddings for Multilingual Query-by-Example Search
Yushi Hu ... Karen Livescu
-
Yushi Hu, et. al.Yushi Hu ... Karen Livescu
19 Jan 2021
19 Jan 2021

Discriminative acoustic word embeddings: Tecurrent neural network-based approaches
Shane Settle ... Karen Livescu
-
Shane Settle, et. al.Shane Settle ... Karen Livescu
01 Dec 2016
01 Dec 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fast Query-by-example Speech Search using Attention-based Deep Binary Embeddings

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing