Abstract

As a low-resource language, Tibetan lacks high- quality corpus resources, leading to a certain extent hinders the development of text sentiment analysis. Through the task of cross-language text sentiment classification, using a small number of labeled samples of Tibetan to dig out the sentiment information of a large number of unlabeled samples, it can solve the problem of lack of corpus to a certain extent. This paper introduces a typical semi-supervised algorithm-collaborative training algorithm in the Tibetan-Chinese cross-language sentiment classification task, constructs a cross-language sentiment classification model based on semi-supervised collaborative training, and treats a balanced Tibetan-Chinese bilingual data set as two different views Conduct bilingual collaborative training and use abundant Chinese annotation data to solve the problem of lack of emotional resources in Tibetan and lack of sufficient labeled samples. Experimental results show that the use of collaborative training algorithms can enhance the learning ability of Tibetan sentiment classifiers for unlabeled samples and improve the accuracy of Tibetan sentiment classification.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call