Abstract

We propose a new integration method of multiple search results for improving search accuracy of Spoken Term Detection (STD). A usual STD system prepares two types of recognition results of spoken documents. If a query consists of in-vocabulary (IV) terms, the results using word-based recognizer are used, and if a query includes out-of-vocabulary (OOV) terms, the results using subword-based recognizer are used. The paper proposes an integration method of these two search results. Each utterance has a similarity score included in the search results. The scores of two results for an utterance has been integrated linearly using a constant weighting factor so far. Our preliminary experiments showed the search accuracy using the subword-based results was higher for some IV queries. In the same way, that using the word-based results was higher for some OOV queries. In the proposed method, the similarity scores of the two search results are compared for the same utterance and a higher weighing factor is given to the results that showed a higher similarity score. The proposed method is evaluated using open test sets, and experimental results demonstrated the search accuracy improved for all test sets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call