The Impact of Active Learning Algorithm on a Cross-lingual model in a Persian Sentiment Task

Monire Shirghasemi,Mohammad Hadi Bokaei,Mahmoud Bijankhan

doi:10.1109/icwr51868.2021.9443156

Abstract

One of the most challenging problems that we may face in natural language processing tasks is the lack of annotated training datasets. In this paper our goal is to consider the impact of Active Learning algorithm on a cross-lingual model in sentiment analysis task on Persian language which is known as a low-resource language. Cross-lingual model trains a model by using a rich-resource language like English as a source language and apply it to a low-resource language, in this way the dependency to training datasets is decreased. Also using Active Learning strategy helps us to improve the functionality of our model by selecting most representative samples. Since labeling data is expensive and time consuming, by selecting the machine desirable data we can reduce the amount of labeled data required for our tasks. To do this we can select data which classifier is the least confident about them. When they are chosen, a user is asked to labeled them. There are lots of methods and factors to choose the appropriate data for Active Learning strategy. In the end these methods help our classifier to gain more knowledge about samples and work more properly.

Full Text