Abstract

The literature has not fully and adequately explained why contextual (e.g., BERT-based) representations are so successful to improve the effectiveness of some Natural Language Processing tasks, especially Automatic Text Classifications (ATC). In this article, we evince that such representations, when properly tuned to a target domain, produce an extremely separable space that makes the classification task very effective, independently of the classifier employed for solving the ATC task. To demonstrate our hypothesis, we perform a thorough class separability analysis in order to visualize and measure how well BERT-based embeddings separate documents of different classes in comparison with other widely used representation approaches, e.g., TFIDF BoW, static embeddings (e.g., fastText) and zero-shot (non-tuned) contextual embeddings. We also analyze separability in the context of transfer learning and compare BERT-based representations with those obtained from other transformers (e.g., RoBERTa, XLNET). Our experiments covering sixteen datasets in topic and sentiment classification, eight classification methods and three class separability metrics show that the fine-tuned BERT embeddings are highly separable in the corresponding space (e.g., they are 67% more separable than the static embeddings). As a consequence, they allow the simplest classifiers to achieve similar effectiveness as the most complex methods. We also find moderate to high correlations between separability and effectiveness in all experimented scenarios. Overall, our main finding is that more discriminative (i.e., separable) textual representations constitute a critical part of the ATC solutions that, given the current state-of-the-art in classification algorithms, are more prominent than the algorithmic (classifier) method for solving the task.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call