Abstract

This paper describes ongoing work to augment morphological transducers for seven Turkic languages with support for multiple scripts each, as well as preliminary work adding IPA transcription systems. Evaluation demonstrates that our approach yields coverage equivalent to or not much lower than that of the base transducers.

Highlights

  • Of the existing Free/Open-Source morphological transducers for Turkic languages (Washington et al 2020), each is implemented in only one orthography, despite a number of the languages being currently written in two or more orthographies, or having a large body of text written in an orthography that has been in use recently

  • This paper presents the implementation of support for additional orthographies for seven Turkic-language transducers

  • Our approach is to hand-write finite-state models of the mapping between the orthography of the base transducer and other orthographies, which is sufficient for most cases we dealt with

Read more

Summary

Introduction

This paper builds on work in which Cyrillic support was added to a transducer for Crimean Tatar which had been implemented in the Latin script (Tyers et al 2019). We leverage morphological transducers for Kazakh (implemented in the Cyrillic script), Kyrgyz (Cyrillic), Turkmen (Latin), Qaraqalpaq (Latin), Uzbek (Latin), and Uyghur (Perso-Arabic), and add support for analysis and generation in additional scripts that are currently or have recently been used for the languages. We add Cyrillic support to Turkmen, Qaraqalpaq, Uzbek, and Uyghur transducers; Perso-Arabic support to the Kazakh and Kyrgyz transducers; Latin script to the Kazakh and Uyghur transducers; and additional Latin orthographies to the Qaraqalpaq transducer. The addition of International Phonetic Alphabet (IPA) transcription support to these transducers has begun, and initial work on adding such support to the Kyrgyz transducer is discussed

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.