Speech data collection system for KUI, a Low resourced tribal language

Subrat Kumar Nayak,Prithviraj Mohanty,Nrusingha Tripathy,Amrutanshu Panigrahi,Smitaprava Mishra,Ajit Kumar Nayak,Abhilash Pati

doi:10.32629/jai.v7i1.1121

Abstract

<p>A new generation of speech translation technology is being developed to enable natural cross-language communication. Research efforts must focus on large vocabulary, spontaneous speech, and speaker variances to accommodate the varying demands of speech recognition technologies. These are important issues that need to be resolved for the general application of voice recognition in realistic settings. Most languages with limited resources don’t even have any speech data. Creating speech corpora is extremely difficult and time-consuming. Among all, KUI is regarded as one of the low-resource languages. In this paper, we developed the speech dataset for the KUI language to document and preserve their culture, tradition, and history for future generations. We also discuss the design, data collection procedures, and implementations and outline the different research possibilities using our KUI dataset. This paper mainly describes the GUI and method for the collection of KUI speech more quickly. In this section, the statistics of the people who helped and contributed to the collection of this KUI dataset have been provided. This study details a novel method of gathering data for any speech dataset. Using this process, we collected 60 hours of speech data sampled at 16 kHz by three different devices such as a Zoom recorder, Mobile, and Laptop from 80 different speakers. Each speaker contributed 500 sentences in the KUI language. A GUI application is designed to capture the speeches of numerous speakers in the KUI language. Several guidelines are proposed and used for the collection of the KUI speech dataset. All the guidelines are based on real-time experience gained during the data collection process by our team members.</p>

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Autonomous Intelligence	Publication Date: Oct 8, 2023
Citations: 2	License type: CC BY-NC 4.0

R Discovery Prime

R Discovery Prime

Speech data collection system for KUI, a Low resourced tribal language

Abstract

Talk to us

Similar Papers

More From: Journal of Autonomous Intelligence

Lead the way for us

Similar Papers

Voicer: A Crowd Sourcing Tool for Speech Data Collection
Darshana Buddhika ... Ranula Liyadipita
-
Darshana Buddhika, et. al.Darshana Buddhika ... Ranula Liyadipita
01 Sep 2018
01 Sep 2018

Speech data collection system for Kazakh language
Darkhan Kuanyshbay ... Arailym Kuanyshbayeva
-
Darkhan Kuanyshbay, et. al.Darkhan Kuanyshbay ... Arailym Kuanyshbayeva
25 Nov 2021
25 Nov 2021

Identification of Seven Low-Resource North-Eastern Languages: An Experimental Study
Joyanta Basu ... Swanirbhar Majumder
-
Joyanta Basu, et. al.Joyanta Basu ... Swanirbhar Majumder
01 Jan 2020
01 Jan 2020

Identification of two tribal languages of India: An experimental study
Joyanta Basu ... Swanirbhar Majumder
-
Joyanta Basu, et. al.Joyanta Basu ... Swanirbhar Majumder
24 Jun 2021
24 Jun 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Speech data collection system for KUI, a Low resourced tribal language

Abstract

Talk to us

Similar Papers

More From: Journal of Autonomous Intelligence