Detection, diarization, and transcription of far-field lecture speech

Jing Huang,Gerasimos Potamianos,Etienne Marcheret,Vit Libal,Karthik Visweswariah

doi:10.21437/interspeech.2007-583

Abstract

Speech processing of lectures recorded inside smart rooms has recently attracted much interest. In particular, the topic has been central to the Rich Transcription (RT) Meeting Recognition Evaluation campaign series, sponsored by NIST, with emphasis placed on benchmarking speech activity detection (SAD), speaker diarization (SPKR), speech-to-text (STT), and speakerattributed STT (SASTT) technologies. In this paper, we present the IBM systems developed to address these tasks in preparation for the RT 2007 evaluation, focusing on the far-field condition of lecture data collected as part of European project CHIL. For their development, the systems are benchmarked on a subset of the RT Spring 2006 (RT06s) evaluation test set, where they yield significant improvements for all SAD, SPKR, and STT tasks over RT06s results; for example, a 16% relative reduction in word error rate is reported in STT, attributed to a number of system advances discussed here. Initial results are also presented on SASTT, a task newly introduced in 2007 in place of the discontinued SAD. Index Terms: speech processing, speech recognition, speaker diarization, speech activity detection, lectures, smart rooms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Detection, diarization, and transcription of far-field lecture speech

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

The IBM RT07 Evaluation Systems for Speaker Diarization on Lecture Meetings
Jing Huang ... Etienne Marcheret
-
Jing Huang, et. al.Jing Huang ... Etienne Marcheret
01 Jan 2008
01 Jan 2008

Theoretical Analysis of Diversity in an Ensemble of Automatic Speech Recognition Systems
Kartik Audhkhasi ... Shrikanth S Narayanan
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22
Kartik Audhkhasi, et. al.Kartik Audhkhasi ... Shrikanth S Narayanan
01 Mar 2014
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22

Adaptive Language Modeling With Varied Sources to Cover New Vocabulary Items
S.E Schwarm ... I Bulyko
IEEE Transactions on Speech and Audio Processing | VOL. 12
S.E Schwarm, et. al.S.E Schwarm ... I Bulyko
01 May 2004
IEEE Transactions on Speech and Audio Processing | VOL. 12

The IBM Rich Transcription 2007 Speech-to-Text Systems for Lecture Meetings
Jing Huang ... Gerasimos Potamianos
-
Jing Huang, et. al.Jing Huang ... Gerasimos Potamianos
01 Jan 2008
The IBM Rich Transcription 2007 Speech-to-Text Systems for Lecture Meetings
Jing Huang ... Gerasimos Potamianos

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Detection, diarization, and transcription of far-field lecture speech

Abstract

Talk to us

Similar Papers