Developing an automatic transcription and retrieval system for spoken lectures in Turkish

Ebru Arisoy

doi:10.1109/siu.2017.7960374

Abstract

With the increase of online video lectures, using speech and language processing technologies for education has become quite important. This paper presents an automatic transcription and retrieval system developed for processing spoken lectures in Turkish. The main steps in the system are automatic transcription of Turkish video lectures using a large vocabulary continuous speech recognition (LVCSR) system and finding keywords on the lattices obtained from the LVCSR system using a speech retrieval system based on keyword search. While developing this system, first a state-of-the-art LVCSR system was developed for Turkish using advance acoustic modeling methods, then keywords were extracted automatically from word sequences in the reference transcriptions of video lectures, and a speech retrieval system was developed for searching these keywords in the lattice output of the LVCSR system. The spoken lecture processing system yields 14.2% word error rate and 0.86 maximum term weighted value on the test data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Developing an automatic transcription and retrieval system for spoken lectures in Turkish

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Exemplar-Based Sparse Representation Features: From TIMIT to LVCSR
Tara N Sainath ... David Nahamoo
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 19
Tara N Sainath, et. al.Tara N Sainath ... David Nahamoo
01 Nov 2011
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 19

Quantifying the value of pronunciation lexicons for keyword search in lowresource languages
Guoguo Chen ... Oguz Yilmaz
-
Guoguo Chen, et. al.Guoguo Chen ... Oguz Yilmaz
01 May 2013
01 May 2013

A large-vocabulary continuous speech recognition system for Hindi
M Kumar ... N Rajput
IBM Journal of Research and Development | VOL. 48
M Kumar, et. al.M Kumar ... N Rajput
01 Sep 2004
IBM Journal of Research and Development | VOL. 48

A large-vocabulary continuous speech recognition system for Hindi
...
IBM Journal of Research and Development | VOL. -
, et. al. ...
01 Sep 2004
IBM Journal of Research and Development | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Developing an automatic transcription and retrieval system for spoken lectures in Turkish

Abstract

Talk to us

Similar Papers