Abstract

This paper describes the progress of the development of multilingual speech enabled interface by exploring the state-of-the-art deep learning techniques in the frame of the bilateral project named “Deep Learning for Advanced Speech Enabled Applications”. The advancement is especially expected in automatic subtitling of broadcast television and radio programs, databases creation, indexing and information retrieval. This implies investigation of deep learning techniques in the following sub-tasks: a) multilingual large vocabulary continuous speech recognition, b) audio events detection, c) speaker clustering and diarization, d) spoken discourse, speech, paragraph and sentence segmentation, e) emotion recognition and f) microphone array/multi-channel speech enhancement, g) data mining, h) multilingual speech synthesis, and i) spoken dialogue user interfaces. This paper describes the current work, description of the available data in the project and achieved results in the first task of Slovak speech recognition Kaldi module using deep learning algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call