Abstract
This study offers a detailed evaluation of automatic speech recognition (ASR) systems for the Kazakh, examining their performance in recognizing the phonetic and linguistic features unique to the language. The Kazakh language presents specific challenges for ASR due to its complex phonology, vowel harmony, and the presence of multiple regional dialects. To address these challenges, a comparative analysis of three leading ASR systems were conducted—Kaldi, Mozilla DeepSpeech, and Google Speech-to-Text API—using a dataset of 101 recordings of spoken the Kazakh text. This study focuses on the systems' word error rates (WER), identifying common misrecognitions, especially with the Kazakh-specific phonemes like "қ," "ң," and "ү." Kaldi and Mozilla DeepSpeech exhibited high WERs, particularly struggling with Kazakh’s vowel harmony and consonant distinctions, while Google Speech-to-Text achieved of the lowest WER among the three. However, none of the systems demonstrated accuracy levels sufficient for practical applications, as errors in recognizing Kazakh’s agglutinative morphology and case endings remained pervasive. To improve these outcomes, a series of enhancements are proposed, including adapting acoustic models to better reflect Kazakh’s phonetic and morphological traits, integrating dialect-specific data, and employing machine learning methods such as transfer learning and hybrid models. Additional steps include refining data preprocessing and increasing dataset diversity to capture Kazakh’s linguistic nuances more accurately. By addressing these limitations, the ASR systems can better handle complex sentence structures and regional speech variations. This research thus provides a foundation for advancing Kazakh ASR technologies and contributes insights that are vital for developing inclusive, effective ASR systems capable of supporting linguistically diverse users.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Similar Papers
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.