Abstract

Opinions play an essential role in human life. With the ease of sharing opinions, ideas, and feelings on various topics through the web and social networks, the analysis of opinions and emotions has become increasingly important. As social networks continue to expand, the importance of sentiment analysis will only grow. While much research has been conducted on sentiment analysis in the Persian language, its accuracy still falls short compared to available English methods, and it faces several challenges. One of the most significant challenges is the lack of labeled datasets. To improve sentiment analysis, numerous datasets have been collected during various research projects. Despite these efforts, the volume of labeled data remains insignificant because labeling unlabeled data is a costly and time-consuming process due to its manual and human nature. This research presents a semi-automatic method for generating labeled datasets. The proposed method combines pre-trained deep learning models with a human agent, allowing more labeled data to be obtained while spending less money, time, and manpower and using the power of deep learning models. Some unlabeled data were labeled based on this method and added to the basic dataset to create a new dataset called the "proposed dataset". To evaluate the effectiveness of the proposed method, both the basic and proposed datasets were tested on the ParsBERT language model using the same test dataset. The results showed a 4% improvement in ParsBERT 's F1 score on the proposed dataset compared to the basic dataset. Notably, fine-tuning ParsBERT with the new dataset also made it more general and removed one of its weaknesses, i.e., overfitting.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.