Neural Network Based End-to-End Query by Example Spoken Term Detection

Dhananjay Ram,Lesly Miculicich,Herve Bourlard

doi:10.1109/taslp.2020.2988788

Abstract

This article focuses on the problem of query by example spoken term detection (QbE-STD) in zero-resource scenario. State-of-the-art approaches primarily rely on dynamic time warping (DTW) based template matching techniques using phone posterior or bottleneck features extracted from a deep neural network (DNN). We use both monolingual and multilingual bottleneck features, and show that multilingual features perform increasingly better with more training languages. Previously, it has been shown that the DTW based matching can be replaced with a CNN based matching while using posterior features. Here, we show that the CNN based matching outperforms DTW based matching using bottleneck features as well. In this case, the feature extraction and pattern matching stages of our QbE-STD system are optimized independently of each other. We propose to integrate these two stages in a fully neural network based end-to-end learning framework to enable joint optimization of those two stages simultaneously. The proposed approaches are evaluated on two challenging multilingual datasets: Spoken Web Search 2013 and Query by Example Search on Speech Task 2014, demonstrating in each case significant improvements.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Neural Network Based End-to-End Query by Example Spoken Term Detection

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2020
Citations: 53

Similar Papers

Multilingual Bottleneck Features for Query by Example Spoken Term Detection
Dhananjay Ram ... Lesly Miculicich
-
Dhananjay Ram, et. al.Dhananjay Ram ... Lesly Miculicich
01 Dec 2019
01 Dec 2019

Pairwise learning using multi-lingual bottleneck features for low-resource query-by-example spoken term detection
Yougen Yuan ... Haizhou Li
Control theory & applications | VOL. -
Yougen Yuan, et. al.Yougen Yuan ... Haizhou Li
01 Mar 2017
Control theory & applications | VOL. -

Query-by-example spoken term detection using bottleneck feature and Hidden Markov model
Xue Liu ... Niansong Wang
-
Xue Liu, et. al. Xue Liu ... Niansong Wang
01 Aug 2015
01 Aug 2015

CNN-based bottleneck feature for noise robust query-by-example spoken term detection
Hyungjun Lim ... Younggwan Kim
-
Hyungjun Lim, et. al.Hyungjun Lim ... Younggwan Kim
01 Dec 2017
01 Dec 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Neural Network Based End-to-End Query by Example Spoken Term Detection

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing