Abstract

Estimating the similarity of biomedical sentence pair is an important component in such natural language processing (NLP) tasks as text retrieval and text summarization with great amount of biomedical information growing. Deep learning-based approaches have been successfully applied to the task, but they often rely on traditional pre-trained context-independent word embedding. Bidirectional Encoder Representations from Transformers (BERT) is recently employed to pre-train contextualized word/sentence representation models via bidirectional Transformers, outperforming the state-of-the-art for many NLP tasks. The mutual semantic influence between sentences is important for estimating semantic textual similarity, which is neglected in existing methods including BERT. On the other hand, biomedical corpora mainly consist of syntactic complex and long sentences. Owing to the above-mentioned issues, we proposed a hybrid architecture, integrating the pre-trained BERT and downstream bidirectional recurrent neural network (bi-RNN). The proposed model enhanced the sentence semantic representation via employing the self-attention instead of global attention to perform cross attention between sentences. Meanwhile, bi-RNN reduced redundant information in the output of BERT. Experimental results show that the best fine-tuned models consistently outperform previous methods and advance the state-of-the-art for clinical semantic textual similarity in OHNLP 2018 task 2, with up to 0.6% increase in Pearson correlation coefficient.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.