Abstract

Information retrieval tasks such as document retrieval and topic detection and tracking (TDT) show little degradation when applied to speech recognizer output. We claim that the robustness of the process is because of inherent redundancy in the problem: not only are words repeated, but semantically related words also provide support. We show how document and query expansion can enhance that redundancy and make document retrieval robust to speech recognition errors. We show that the same effect is true for TDT′s tracking task, but that recognizer errors are more of an issue for new event and story link detection.

Highlights

  • The prevalence and success of search engines on the Web have broadly illustrated that information retrieval (IR) methods can successfully find documents relevant to many queries

  • The result of TREC-6 was a finding that automatic speech recognition (ASR) errors caused approximately a 10% drop in effectiveness, regardless of whether the queries are easy or are engineered to be “difficult” for an ASR system

  • We have shown in the link detection task that document expansion in topic detection and tracking (TDT) reduces the impact of ASR errors

Read more

Summary

INTRODUCTION

The prevalence and success of search engines on the Web have broadly illustrated that information retrieval (IR) methods can successfully find documents relevant to many queries. We discuss the impact of speech recognition systems on document organization and retrieval. One problem with the approach of converting the speech to text is that automatic speech recognition (transcription) systems are not perfect. They occasionally generate words that sound similar to what was said, sometimes drop words, and occasionally insert words that were not there. The ability of a system to provide accurate search results is not substantially affected by speech recognition errors We support this claim by exploring two information retrieval tasks and the impact of automatic speech recognition (ASR) errors on their accuracy.

WHY ASR LOOKS LIKE A PROBLEM
ADDRESSING ASR ISSUES
Recognizer-based expansion
Corpus-based expansion
Other approaches
SPOKEN DOCUMENT RETRIEVAL
TREC SDR evaluations
Query expansion
Document expansion
SDR summary
TOPIC DETECTION AND TRACKING
Tracking
New event detection
Link detection
TDT summary
ROBUSTNESS BREAKS DOWN
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call