Abstract

We present a semi-supervised annotation process for identifying and labelling explicit aspects of an initially unlabelled corpus. Firstly, we employ cross-domain learning to pre-annotate the initial data, deliberately excluding domain-related input features to ensure effective learning transfer. Then, we apply an active learning strategy to enhance the pre-annotation performance and enrich the learning data. We adjust the strategy to sequence labeling and address class imbalance. We evaluate this process using two unlabelled datasets in French, consisting of user opinions on beauty products and electronic devices, respectively. The results show an improved F1-score achieved by increasing and correcting 30% of the training dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call