Abstract
Due to the rapidly increasing amount of biomedical publications, it has become challenging to follow scientific articles and new developments. Keywords in scientific articles provide a quick understanding and summarize the important points of the context. When keywords are not used in some biomedical articles or are not sufficient to express the content of the text, automatic keyword extraction systems are needed. This paper addresses the keyword extraction problem as a sequence labeling task where words are represented as deep contextual embeddings. We predict the keyword tags identified in sequence labeling by fine-tuning XLNET and BERT-based models such as BERT, BioBERT, SCIBERT, and RoBERTa. Our proposed method does not need extra dictionaries required by rule-based methods and feature extraction as in traditional machine learning methods. Performance evaluation on the benchmark dataset for biomedical keyword extraction shows that domain-specific contextualized embeddings (BioBERT, SciBERT) achieve state-of-the-art results compared to the general domain embeddings (BERT, RoBERTa, XLNET) and unsupervised methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.