Abstract

This paper presents a novel two-pass dynamic time warping (DTW) approach to build Query-by-Example Spoken Term Detection (QbE-STD) system for Zero Resource Languages. An unconstrained-endpoint dynamic time warping (UE-DTW) algorithm is used to locate the query term occurrences in a long conversational audio. The proposed approach uses a segmental DTW, wherein search is carried out only at syllable boundaries. This reduces the search complexity by 9 times compared to conventional sliding window DTW. The first pass of the proposed method uses a minimum set of templates for a keyword to search through the segmented audio. New templates are identified after the first pass. In the second pass, the initial templates along with the new templates identified in the first pass are used to search for the keyword occurrences. A novel score normalization technique is also proposed, in which the syllables constituting the keyword are used for normalization. The performance of the proposed two-pass system is shown to be better than the single pass systems. The proposed score normalization technique further improves the overall detection results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.