Enhanced Bootstrapping Algorithm for Automatic Annotation of Tweets

Mudasir Mohd,Nida Hakak,Rafiya Jan

doi:10.4018/ijcini.2020040103

Abstract

Annotations are critical in various text mining tasks such as opinion mining, sentiment analysis, word sense disambiguation. Supervised learning algorithms start with the training of the classifier and require manually annotated datasets. However, manual annotations are often subjective, biased, onerous, and burdensome to develop; therefore, there is a need for automatic annotation. Automatic annotators automatically annotate the data for creating the training set for the supervised classifier, but lack subjectivity and ignore semantics of underlying textual structures. The objective of this research is to develop scalable and semantically rich automatic annotation system while incorporating domain dependent characteristics of the annotation process. The authors devised an enhanced bootstrapping algorithm for the automatic annotation of Tweets and employed distributional semantic models (LSA and Word2Vec) to augment the novel Bootstrapping algorithm and tested the proposed algorithm on the 12,000 crowd-sourced annotated Tweets and achieved a 68.56% accuracy which is higher than the baseline accuracy.

Full Text