Abstract

Due to the rapidly increasing amount of biomedical publications, it has become challenging to follow scientific articles and new developments. Keywords in scientific articles provide a quick understanding and summarize the important points of the context. When keywords are not used in some biomedical articles or are not sufficient to express the content of the text, automatic keyword extraction systems are needed. This paper addresses the keyword extraction problem as a sequence labeling task where words are represented as deep contextual embeddings. We predict the keyword tags identified in sequence labeling by fine-tuning XLNET and BERT-based models such as BERT, BioBERT, SCIBERT, and RoBERTa. Our proposed method does not need extra dictionaries required by rule-based methods and feature extraction as in traditional machine learning methods. Performance evaluation on the benchmark dataset for biomedical keyword extraction shows that domain-specific contextualized embeddings (BioBERT, SciBERT) achieve state-of-the-art results compared to the general domain embeddings (BERT, RoBERTa, XLNET) and unsupervised methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call