Abstract

Aspect level sentiment analysis is a fine grained task in sentiment analysis which identifies the product features from an opinionated piece of text and maps the sentiment towards each of them. Supervised ML algorithms have reported comparatively higher performance on aspect level sentiment analysis but at the cost of substantial qualitative labelled data. Data labelling for such fine grained tasks also demand domain knowledge and expertise. Hence a mechanism to extract a minimal informative subset which is almost representative of the entire data would be a breakthrough in bringing down the annotation costs to a large extent. The proposed methodology puts forward an active learning based sampling strategy for aspect term extraction, a subtask in aspect level sentiment analysis which identifies the product features. The sampling strategy is automated by reinforcement learning which extracts an optimal sample from the entire unlabelled training data and hence optimizes data annotation by reducing the time and effort linked to the labelling process. This work is of high importance in a data driven era where companies invest a lot in collecting and annotating huge volumes of data. The model has been experimented across the laptop and restaurant domains of SemEval (2014–2016) datasets. The experiments proved that a considerable reduction of the training data size is achieved across different datasets. The model trained on the data extracted by the proposed reinforced active learning model beats random sampling by 9 to 17 points when evaluated on the F-measure of the extracted aspect terms and is almost on par with the model trained on the entire training data by utilising hardly 9 to 13% of the entire training data across the datasets experimented.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call