Partial matching and search space reduction for QbE-STD

Maulik C Madhavi,Hemant A Patil

doi:10.1016/j.csl.2017.03.004

Abstract

Query-by-Example approach of spoken content retrieval has gained much attention because of its feasibility in the absence of speech recognition and its applicability in a multilingual matching scenario. This approach to retrieve spoken content is referred to as Query-by-Example Spoken Term Detection (QbE-STD). The state-of-the-art QbE-STD system performs matching between the frame sequence of query and test utterance via Dynamic Time Warping (DTW) algorithm. In realistic scenarios, there is a need to retrieve the query which does not appear exactly in the spoken document. However, the appeared instance of query might have the different suffix, prefix or word order. The DTW algorithm monotonically aligns the two sequences and hence, it is not suitable to perform partial matching between the frame sequence of query and test utterance. In this paper, we propose novel partial matching approach between spoken query and utterance using modified DTW algorithm where multiple warping paths are constructed for each query and test utterance pair. Next, we address the research issue associated with search complexity of DTW and suggest two approaches, namely, feature reduction approach and Bag-of-Acoustic-Words (BoAW) model. In feature reduction approach, the number of feature vectors is reduced by averaging across the consecutive frames within phonetic boundaries. Thus, a lesser number of feature vectors require fewer number of comparisons and hence, DTW speeds up the search computation. The search computation time gets reduced by 46–49% with a slight degradation in performance as compared to no feature reduction case. In BoAW model, we construct term frequency-inverse document frequency (tf−idf) vectors at segment-level to retrieve audio documents. The proposed segment-level BoAW model is used to match test utterance with a query using (tf−idf) vectors and the scores obtained are used to rank the test utterance. The BoAW model gave more than 80% recall value on 70% top retrieval. To re-score the detection, we further employ DTW search or modified DTW search to retrieve the spoken query from the selected utterances using BoAW model. QbE-STD experiments are conducted on different international benchmarks, namely, MediaEval spoken web search SWS 2013 and MediaEval query-by-example search on speech QUESST 2014.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Partial matching and search space reduction for QbE-STD

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language

Lead the way for us

Journal: Computer Speech & Language	Publication Date: Mar 28, 2017
Citations: 17

Similar Papers

Intrinsic spectral analysis based on temporal context features for query-by-example spoken term detection
Peng Yang ... Haizhou Li
-
Peng Yang, et. al.Peng Yang ... Haizhou Li
14 Sep 2014
14 Sep 2014

Combining evidences from detection sources for query-by-example spoken term detection
Maulik C Madhavi ... Hemant A Patil
-
Maulik C Madhavi, et. al.Maulik C Madhavi ... Hemant A Patil
01 Dec 2017
01 Dec 2017

Speed improvements to Information Retrieval-based dynamic time warping using hierarchical K-Means clustering
Gautam Mantena ... Xavier Anguera
-
Gautam Mantena, et. al.Gautam Mantena ... Xavier Anguera
01 May 2013
01 May 2013

A fast query-by-example spoken term detection for zero resource languages
Pandia D.S Karthik ... Hema A Murthy
-
Pandia D.S Karthik, et. al.Pandia D.S Karthik ... Hema A Murthy
01 Jun 2016
01 Jun 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Partial matching and search space reduction for QbE-STD

Abstract

Talk to us

Similar Papers

More From: Computer Speech &amp; Language

More From: Computer Speech & Language