Abstract

• Propose an instance-based Zero-shot learning method for facilitating the semi-automatic indexing of biomedical articles. • Novel labels of complex ontologies are ranked through exploiting similarity functions on an arbitrary semantic space. • We design a hybrid criterion based on underlying label dependencies, semantic similarities and pattern matching rules. • Robust ranking performance of the novel labels is obtained for ideal, artificially noisy and realistic oracles. • We obtain promising multi-label ranking performance regarding the novel labels on the last version of the PubMed dataset. Zero-shot learning constitutes a variant of the broader category of weakly supervised learning algorithms. Its main asset is the possibility of identifying entities for which no training data are provided in advance. Under this extreme scenario, conventional supervised learning methods cannot operate properly, while consumption of human resources for obtaining even limited instances may be highly restricted, especially when the label space is quite complex because of its cardinality and the underlying semantic dependencies. However, removing the human factor from the learning loop under complicated tasks cannot guarantee robust performance. Thus, semi-automated solutions are widely accepted by both the research and industrial communities, favoring cooperation of human and machine, mainly for alleviating the spent effort of the former, and for acquiring safer predictions. In contrast with the majority of existing Zero-shot learning approaches, we propose a generalized instance-based method oriented towards tackling the Multi-label classification task without performing any transductive operations over the test instances. Instead, we aim to provide a label ranking of the unseen classes exploiting sentence-based semantic embeddings and label similarities, through a dedicated fine-tuned language representational model. We also use a pattern matching rule to further boost the ranking of our method. Some realistic assumptions are made in order for our approach to work correctly and provide said ranking. Results on a biomedical database with a semantically rich fine-grained label space are really promising, rendering its utilization as a helpful and computationally inexpensive tool for facilitating semi-automated indexing.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.