Abstract
In recent years, more and more attention has been paid to text sentiment analysis, which has gradually become a research hotspot in information extraction, data mining, Natural Language Processing (NLP), and other fields. With the gradual popularization of the Internet, sentiment analysis of Uyghur texts has great research and application value in online public opinion. For low-resource languages, most state-of-the-art systems require tens of thousands of annotated sentences to get high performance. However, there is minimal annotated data available about Uyghur sentiment analysis tasks. There are also specificities in each task—differences in words and word order across languages make it a challenging problem. In this paper, we present an effective solution to providing a meaningful and easy-to-use feature extractor for sentiment analysis tasks: using the pre-trained language model with BiLSTM layer. Firstly, data augmentation is carried out by AEDA (An Easier Data Augmentation), and the augmented dataset is constructed to improve the performance of text classification tasks. Then, a pretraining model LaBSE is used to encode the input data. Then, BiLSTM is used to learn more context information. Finally, the validity of the model is verified via two categories datasets for sentiment analysis and five categories datasets for emotion analysis. We evaluated our approach on two datasets, which showed wonderful performance compared to some strong baselines. We close with an overview of the resources for sentiment analysis tasks and some of the open research questions. Therefore, we propose a combined deep learning and cross-language pretraining model for two low resource expectations.
Highlights
Introduction published maps and institutional affilWith the rapid development of the Internet and the rise of communication platforms such as social media, online forums and e-commerce platforms, Natural Language Processing (NLP) technology plays a key role in the processing, understanding, and applications of text in the face of many unstructured text datasets generated on the Internet
We propose a method to add BiLSTM layers, in which our datasets from outputs that have been pre-trained across languages are associated with BiLSTM layers for better learning context features
Sentiment analysis has a wide range of applications, which are an important task in NLP
Summary
Sentiment analysis has a wide range of applications, which are an important task in NLP. Paraphrasing is used to make some changes to the words, phrases, and sentence structure in the sentence, keeping the original meaning This method can make use of dictionaries, knowledge maps, semantic vectors, BERT [11] models, rules, Machine. The method of data augmentation with noise is simple to use, but it will affect the sentence structure and semantics, with limited diversity, and mainly improve the robustness [20]. Can [25] proposes a limited data model based on the RNN framework and a language with the largest dataset, and applies it to languages with limited resources, which has a better effect on the sentiment analysis of small languages. Sun [29] proposed an aspect based sentiment analysis, fine-tuning BERT’s pre-training model and getting new and good results on the SentiHood and SemEVAL-2014. SentiBERT’s method has more advantages than the baseline in capturing negative and contrastive relationships and constructing combinatorial models
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.