Abstract

Speech processing of lectures recorded inside smart rooms has recently attracted much interest. In particular, the topic has been central to the Rich Transcription (RT) Meeting Recognition Evaluation campaign series, sponsored by NIST, with emphasis placed on benchmarking speech activity detection (SAD), speaker diarization (SPKR), speech-to-text (STT), and speakerattributed STT (SASTT) technologies. In this paper, we present the IBM systems developed to address these tasks in preparation for the RT 2007 evaluation, focusing on the far-field condition of lecture data collected as part of European project CHIL. For their development, the systems are benchmarked on a subset of the RT Spring 2006 (RT06s) evaluation test set, where they yield significant improvements for all SAD, SPKR, and STT tasks over RT06s results; for example, a 16% relative reduction in word error rate is reported in STT, attributed to a number of system advances discussed here. Initial results are also presented on SASTT, a task newly introduced in 2007 in place of the discontinued SAD. Index Terms: speech processing, speech recognition, speaker diarization, speech activity detection, lectures, smart rooms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call