Abstract
Neural text classification models typically treat output labels as categorical variables that lack description and semantics. This forces their parametrization to be dependent on the label set size, and, hence, they are unable to scale to large label sets and generalize to unseen ones. Existing joint input-label text models overcome these issues by exploiting label descriptions, but they are unable to capture complex label relationships, have rigid parametrization, and their gains on unseen labels happen often at the expense of weak performance on the labels seen during training. In this paper, we propose a new input-label model that generalizes over previous such models, addresses their limitations, and does not compromise performance on seen labels. The model consists of a joint nonlinear input-label embedding with controllable capacity and a joint-space-dependent classification unit that is trained with cross-entropy loss to optimize classification performance. We evaluate models on full-resource and low- or zero-resource text classification of multilingual news and biomedical text with a large label set. Our model outperforms monolingual and multilingual models that do not leverage label semantics and previous joint input-label space models in both scenarios.
Highlights
Text classification is a fundamental NLP task with numerous real-world applications such as topic recognition (Tang et al, 2015; Yang et al, 2016), sentiment analysis (Pang and Lee, 2005; Yang et al, 2016), and question answering (Chen et al, 2015; Kumar et al, 2015)
To encode the input text, we focus on hierarchical attention networks (HANs), which are competitive for monolingual (Yang et al, 2016) and multilingual text classification (Pappas and Popescu-Belis, 2017)
GILE-word-level attention neural network (WAN) outperforms WSABIE+ and AiTextML variants6 by a large margin in both cases—for example, by +7.75, +11.61 points on seen labels and by +12.58, +10.29 points in terms of average precision on unseen labels, respectively
Summary
Text classification is a fundamental NLP task with numerous real-world applications such as topic recognition (Tang et al, 2015; Yang et al, 2016), sentiment analysis (Pang and Lee, 2005; Yang et al, 2016), and question answering (Chen et al, 2015; Kumar et al, 2015). Previous work has leveraged knowledge from the label texts through a joint input-label space, initially for image classification (Weston et al, 2011; Mensink et al, 2012; Frome et al, 2013; Socher et al, 2013). Such models generalize to labels both seen and unseen during training, and scale well on very large label sets. The word level is made of an encoder network gw and an attention network aw, while the sentence level includes an encoder and an attention network
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Transactions of the Association for Computational Linguistics
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.