Abstract
Information retrieval tasks such as document retrieval and topic detection and tracking (TDT) show little degradation when applied to speech recognizer output. We claim that the robustness of the process is because of inherent redundancy in the problem: not only are words repeated, but semantically related words also provide support. We show how document and query expansion can enhance that redundancy and make document retrieval robust to speech recognition errors. We show that the same effect is true for TDT′s tracking task, but that recognizer errors are more of an issue for new event and story link detection.
Highlights
The prevalence and success of search engines on the Web have broadly illustrated that information retrieval (IR) methods can successfully find documents relevant to many queries
The result of TREC-6 was a finding that automatic speech recognition (ASR) errors caused approximately a 10% drop in effectiveness, regardless of whether the queries are easy or are engineered to be “difficult” for an ASR system
We have shown in the link detection task that document expansion in topic detection and tracking (TDT) reduces the impact of ASR errors
Summary
The prevalence and success of search engines on the Web have broadly illustrated that information retrieval (IR) methods can successfully find documents relevant to many queries. We discuss the impact of speech recognition systems on document organization and retrieval. One problem with the approach of converting the speech to text is that automatic speech recognition (transcription) systems are not perfect. They occasionally generate words that sound similar to what was said, sometimes drop words, and occasionally insert words that were not there. The ability of a system to provide accurate search results is not substantially affected by speech recognition errors We support this claim by exploring two information retrieval tasks and the impact of automatic speech recognition (ASR) errors on their accuracy.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have