Spoken language identification based on the transcript analysis

Dmytro V Lande,Mykyta S Klymenko,Olegh O Dmytrenko,Maksym O Vakulenko,Anatolij I Shevchenko

doi:10.1093/llc/fqac052

Abstract

Abstract Language identification is a great challenge in language engineering, which arises along with the tasks of speech recognition, machine translation, cross-language information retrieval, intelligent dialogue system creation, etc. The presented article introduces the intelligent language identification technology, which is based on speech recognition and statistical methods of spectrogram analysis. The approach to the automatic identification of the spoken language sample uploaded to the system, in particular from video streaming services such as YouTube, is put forward. The article focuses on the automatic identification of spoken language, taking into account several speech recognition solutions for correct or incorrect speech recognition and its conversion into correct or incorrect text. The obtained algorithm is demonstrated in the Ukrainian and Russian languages. The identification quality of the language of an utterance, which lasts &gt;30 s is almost 100%, and for the utterance of a duration of 30 s, the quality is 98%, and for the 5-s utterance, it reaches 89.6%. In addition to that, the system performance is contingent on the streaming speed, so it is a real-time system.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Spoken language identification based on the transcript analysis

Abstract

Talk to us

Similar Papers

More From: Digital Scholarship in the Humanities

Lead the way for us

Similar Papers

End-to-end Oriental Language Speech Recognition with Integrated Language Identification
Anbin Qi ... Xiang Xie
-
Anbin Qi, et. al.Anbin Qi ... Xiang Xie
01 Oct 2022
01 Oct 2022

Method and system for collaborative speech recognition for small-area network
James Gordon Mclean
The Journal of the Acoustical Society of America | VOL. 118
James Gordon McleanJames Gordon Mclean
01 Jan 2004
The Journal of the Acoustical Society of America | VOL. 118

Measuring listening effort: driving simulator versus simple dual-task paradigm.
Yu-Hsiang Wu ... Elizabeth Stangl
Ear & Hearing | VOL. 35
Yu-Hsiang Wu, et. al.Yu-Hsiang Wu ... Elizabeth Stangl
01 Nov 2014
Ear & Hearing | VOL. 35

Studying machine translation technologies for large-data CLIR tasks: a patent prior-art search case study
Walid Magdy ... Gareth J F Jones
Information Retrieval | VOL. 17
Walid Magdy, et. al.Walid Magdy ... Gareth J F Jones
21 Nov 2013
Information Retrieval | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Spoken language identification based on the transcript analysis

Abstract

Talk to us

Similar Papers

More From: Digital Scholarship in the Humanities