Towards Slovak-English-Mandarin Speech Recognition Using Deep Learning

Matus Pleva,Jan Stas,Wuhua Hsu,Yuan-Fu Liao,Daniel Hladek,Peter Viszlay,Martin Lojka,Jozef Juhar

doi:10.23919/elmar.2018.8534661

Abstract

This paper describes the progress of the development of multilingual speech enabled interface by exploring the state-of-the-art deep learning techniques in the frame of the bilateral project named “Deep Learning for Advanced Speech Enabled Applications”. The advancement is especially expected in automatic subtitling of broadcast television and radio programs, databases creation, indexing and information retrieval. This implies investigation of deep learning techniques in the following sub-tasks: a) multilingual large vocabulary continuous speech recognition, b) audio events detection, c) speaker clustering and diarization, d) spoken discourse, speech, paragraph and sentence segmentation, e) emotion recognition and f) microphone array/multi-channel speech enhancement, g) data mining, h) multilingual speech synthesis, and i) spoken dialogue user interfaces. This paper describes the current work, description of the available data in the project and achieved results in the first task of Slovak speech recognition Kaldi module using deep learning algorithms.

Full Text