Abstract

In this paper, we present a study for the identification of authors’ national variety of English in texts from social media. In data from Facebook and Twitter, information about the author’s social profile is annotated, and the national English variety (US, UK, AUS, CAN, NNS) that each author uses is attributed. We tested four feature types: formal linguistic features, POS features, lexicon-based features related to the different varieties, and data-based features from each English variety. We used various machine learning algorithms for the classification experiments, and we implemented a feature selectionprocess. The classification accuracy achieved, when the 31 highest rankedfeatures were used, was up to 77.32%. The experimental results are evaluated, and the efficacy of the ranked features discussed. (Less)

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call