In the expanding landscape of biomedical literature, numerous latent associations outlined in scholarly papers await discovery and integration into biomedical databases. Biomedical Natural Language Processing (NLP) research focuses on automating knowledge extraction and mining from this literature, particularly emphasizing the essential task of Relation Extraction (RE). However, existing models have limitations, mainly in their applicability to partial datasets for RE tasks. Moreover, conventional models often treat RE as a binary classification challenge, which proves suboptimal given the diverse relationships, including intricate ones like similarity and hierarchy, present in the RE dataset. These limitations are exacerbated by the models’ inability to capture word-level positional nuances and sentence-level language features. In response to these challenges, we present a novel RE model called BicapBert. This model integrates neural networks and capsule networks, enhancing them with hybrid knowledge graph embeddings to extract relevant features. BicapBert utilizes PubMedBERT and capsule networks to extract word-level positional and sentence-level language features. It further captures knowledge features from a biomedical knowledge graph, integrating them with the aforementioned linguistic features. The amalgamated information is then input into a multi-layer perceptron, culminating in results derived through a softmax classifier. Experimental evaluations on three extensive RE task datasets showcase the state-of-the-art performance of our proposed model. Additionally, we validate the model’s efficacy on three randomly selected biomedical datasets for various tasks, further affirming its superiority.
Read full abstract