Abstract

This paper investigates a variety of progressively more complex similarity measures for vocabulary independent search in phone based audio transcripts. English audio data is segmented and decoded to produce a sequence of phones that represent the data. These sequences are then parsed into N-grams which are used to index the data. The audio segments define the documents to be retrieved and are thus localized in time. Search is performed by expanding text based queries into phone sequences and N -grams, followed by matching these against the index. The baseline similarity measure combines elements found in the literature and uses edit distance with a phonetic confusion matrix to determine the similarity of query and index N-grams. Comparable performance to other approaches in the literature is achieved. Extensions to the baseline are developed using a constrained form of the similarity measure together with the ability to account for higher order confusions, namely of phone bi-grams and tri-grams. Results show improved performance across a variety of system configurations. We then generalize further and use the framework of conditional random fields (CRFs) to model confusions. Whereas others in the literature have used CRFs to model parameters of an edit distance that incorporates deletions, substitutions, and insertions, our approach focuses on using CRFs to model context dependent phone level confusions directly. The CRF is trained on parallel phonetic transcripts, which provides a general framework for modeling the errors that a recognition system may make, taking contextual effects into consideration. Results obtained on both in and out of vocabulary (OOV) search tasks improve most notably for OOV, showing 5%-6% relative improvement. Finally, we investigate the degree to which the information captured in the three approaches is complementary and show that system combination can further improve performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.