Abstract

Accuracy of part-of-speech tagging is critical to downstream sub-tasks in front-end text analysis model of text-to-speech System. Uyghuris an agglutinative language in which numbers of words are formed by suffixes attaching to a stem (or root). Owing to there are unlimited new formed and derived syntactic words in Uyghur, Sizes of part-of-speech tagging set were big and out-of-vocabulary words often occurred in conventional Uyghur part-of-speech tagging method which directly trained and predicted the part-of-speech of word. To address this problem, this paper proposes the idea that trains the part-of-speech of stem and predicts the part-of-speech of word mainly by stem. Bi-gram language model is used to segment the stem and affix boundary of word, hidden markov model is used to train and predict part-of-speech of stem. In the end, rule adjusting method is used to adjust the changed part-of-speech of word when suffix attaching to a stem. Experimental result shows that proposed method obviously reduces the part-of-speech tagging error rate comparing to conventional part-of-speech tagging method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.