Abstract
We investigate the use of intrinsic spectral analysis (ISA) for query-by-example spoken term detection (QbE-STD). In the task, spoken queries and test utterances in an audio archive are converted to ISA features, and dynamic time warping is applied to match the feature sequence in each query with those in test utterances. Motivated by manifold learning, ISA has been proposed to recover from untranscribed utterances a set of nonlinear basis functions for the speech manifold, and shown with improved phonetic separability and inherent speaker independence. Due to the coarticulation phenomenon in speech, we propose to use temporal context information to obtain the ISA features. Gaussian posteriorgram, as an efficient acoustic representation usually used in QbE-STD, is considered a baseline feature. Experimental results on the TIMIT speech corpus show that the ISA features can provide a relative 13.5% improvement in mean average precision over the baseline features, when the temporal context information is used. Index Terms: spoken term detection, intrinsic spectral analysis, Gaussian posteriorgram, dynamic time warping
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.