Abstract

Query-by-Example approach of spoken content retrieval has gained much attention because of its feasibility in the absence of speech recognition and its applicability in a multilingual matching scenario. This approach to retrieve spoken content is referred to as Query-by-Example Spoken Term Detection (QbE-STD). The state-of-the-art QbE-STD system performs matching between the frame sequence of query and test utterance via Dynamic Time Warping (DTW) algorithm. In realistic scenarios, there is a need to retrieve the query which does not appear exactly in the spoken document. However, the appeared instance of query might have the different suffix, prefix or word order. The DTW algorithm monotonically aligns the two sequences and hence, it is not suitable to perform partial matching between the frame sequence of query and test utterance. In this paper, we propose novel partial matching approach between spoken query and utterance using modified DTW algorithm where multiple warping paths are constructed for each query and test utterance pair. Next, we address the research issue associated with search complexity of DTW and suggest two approaches, namely, feature reduction approach and Bag-of-Acoustic-Words (BoAW) model. In feature reduction approach, the number of feature vectors is reduced by averaging across the consecutive frames within phonetic boundaries. Thus, a lesser number of feature vectors require fewer number of comparisons and hence, DTW speeds up the search computation. The search computation time gets reduced by 46–49% with a slight degradation in performance as compared to no feature reduction case. In BoAW model, we construct term frequency-inverse document frequency (tf−idf) vectors at segment-level to retrieve audio documents. The proposed segment-level BoAW model is used to match test utterance with a query using (tf−idf) vectors and the scores obtained are used to rank the test utterance. The BoAW model gave more than 80% recall value on 70% top retrieval. To re-score the detection, we further employ DTW search or modified DTW search to retrieve the spoken query from the selected utterances using BoAW model. QbE-STD experiments are conducted on different international benchmarks, namely, MediaEval spoken web search SWS 2013 and MediaEval query-by-example search on speech QUESST 2014.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.