Abstract

Considering the scarcity of Uyghur sentiment resources, in this paper proposed a new combined unsupervised sentiment classification method for Uyghur text without any labeled corpora. In the first part, a Uyghur sentiment dictionary, UYSentiDict, was adopted to classify the sentences. For the sentiment vocabulary matching, both the matching of the original word and the stem were considered, and the influence of sentence patterns, negation words, and degree adverbs were further considered as well. Based on different thresholds, the sentences with higher sentiment values were selected from the lexicon-based classification results as a pseudo-labeled dataset. In the second part, different sentiment characteristics were learned from the pseudo-labeled dataset by the machine learning classifier, and the remaining categorical data were further classified. It can be concluded that the method proposed in this paper has good classification efficiency in Uyghur sentiment corpora in four different fields, and some results were performed better than the classification results of machine learning classifier. Moreover, this method is not restricted by the field of data and does not need to be marked in advance with good training corpus, and can solve the resource shortage problem in the field of Uyghur sentiment classification effectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.