Abstract

For efficient collection of speech recordings, the ability to search for spoken terms in the speech stream is an essential capability. Although the Chinese spoken term detection (STD) does not suffer the out-of-vocabulary (OOV) problem as English, it is still hard to retrieve the long spoken terms which contain four characters or more. In this paper, we details our approach for long Mandarin spoken term detection which combines the search on inverted index produced by speech recognizer and linear scan on syllable confusion network. First, we split the long spoken terms into syllables and search the syllables on the inverted index _le to get the segments which may contain the long spoken terms. Then we use a linear scan algorithm on syllable confusion networks (SCNs). On two Mandarin conversation telephone speech sets, we compare performance using the method proposed with that of the baseline syllable-based systems, and our approach gives satisfying performance gains over the others.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call