Abstract
Dual supervision for text classification and information retrieval, which involves training the machine with class labels augmented with text annotations that are indicative of the class, has been shown to provide significant improvements, both in and beyond active learning (AL) settings. Annotations in the simplest form are highlighted portions of the text that are indicative of the class. In this work, we aim to identify and realize the full potential of unsupervised pretrained word embeddings for text-related tasks in AL settings by training neural nets—specifically, convolutional and recurrent neural nets—through dual supervision. We propose an architecture-independent algorithm for training neural networks with human rationales for class assignments and show how unsupervised embeddings can be better leveraged in active learning settings using the said algorithm. The proposed solution involves the use of gradient-based feature attributions for constraining the machine to follow the user annotations; further, we discuss methods for overcoming the architecture-specific challenges in the optimization. Our results on the sentiment classification task show that one annotated and labeled document can be worth up to seven labeled documents, giving accuracies of up to 70% for as few as ten labeled and annotated documents, and shows promise in significantly reducing user effort for total-recall information retrieval task in systematic literature reviews.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.