Abstract

The modeling of text queries as sequences of embeddings for conducting similarity matching based search within speech features has been recently shown to improve keyword search (KWS) performance, especially for the out-of-vocabulary (OOV) terms. This technique uses a dynamic time warping based search methodology, converting the KWS problem into a pattern search problem by artificially modeling the text queries as pronunciation-based embedding sequences. This query modeling is done by concatenating and repeating frame representations for each phoneme in the keyword's pronunciation. In this letter, we propose a query model that incorporates temporal context information using recurrent neural networks (RNN) trained to generate realistic query representations. With experiments conducted on the IARPA Babel Program's Turkish and Zulu datasets, we show that the proposed RNN-based query generation yields significant improvements over the statistical query models of earlier work, and yields a comparable performance to the state-of-the-art techniques for OOV KWS.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.