Abstract

By training the Chinese sentiment analysis model, it is found that the prediction accuracy of the model trained by one dataset is obviously low on other datasets. Considering that the existing sentiment analysis work mainly uses a single domain corpus dataset and referring to the existing data processing methods on natural language processing, this paper designs an experiment to combine Chinese datasets from different fields into a large field-imbalanced dataset, and the number of samples from different fields in this dataset is obviously different. The new dataset is used to train a comprehensive Chinese sentiment analysis model and achieves satisfactory training results. According to the results of the experiments, the model trained by the field-imbalanced dataset has high prediction accuracy for samples from various fields, and the prediction accuracy increases with the increase of the proportion of corpus in this field in the training dataset. Through the experiment in this paper, some ideas are provided for the construction of large-scale cross-domain Chinese sentiment analysis datasets in the future.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.