Abstract

Estimating the similarity of biomedical sentence pair is an important component in such natural language processing (NLP) tasks as text retrieval and text summarization with great amount of biomedical information growing. Deep learning-based approaches have been successfully applied to the task, but they often rely on traditional pre-trained context-independent word embedding. Bidirectional Encoder Representations from Transformers (BERT) is recently employed to pre-train contextualized word/sentence representation models via bidirectional Transformers, outperforming the state-of-the-art for many NLP tasks. The mutual semantic influence between sentences is important for estimating semantic textual similarity, which is neglected in existing methods including BERT. On the other hand, biomedical corpora mainly consist of syntactic complex and long sentences. Owing to the above-mentioned issues, we proposed a hybrid architecture, integrating the pre-trained BERT and downstream bidirectional recurrent neural network (bi-RNN). The proposed model enhanced the sentence semantic representation via employing the self-attention instead of global attention to perform cross attention between sentences. Meanwhile, bi-RNN reduced redundant information in the output of BERT. Experimental results show that the best fine-tuned models consistently outperform previous methods and advance the state-of-the-art for clinical semantic textual similarity in OHNLP 2018 task 2, with up to 0.6% increase in Pearson correlation coefficient.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call