Abstract

The determination of semantic similarity between sentences is an important component in natural language processing (NLP) tasks such as text retrieval and text summarization. Many approaches have been proposed for estimating sentence similarity, and Siamese neural networks (SNN) provide a better approach. However, the sentence semantic representation, generated by sharing weights in the SNN without any attention mechanism, ignores the different contributions of different words to the overall sentence semantics. Furthermore, the attention operation within only a single sentence neglects interactive semantic influence on similarity estimation. To address these issues, an interactive self-attention (ISA) mechanism is proposed in this paper and integrated with an SNN, named an interactive self-attentive Siamese neural network (ISA-SNN) which is used to verify the effectiveness of ISA. The proposed model obtains the weights of words in a single sentence by means of self-attention and extracts inherent interactive semantic information between sentences via interactive attention to enhance sentence semantic representation. It achieves better performances without feature engineering than other existing methods on three biomedical benchmark datasets (a Pearson correlation coefficient of 0.656 and 0.713/0.658 on DBMI and CDD-ful/-ref, respectively).

Highlights

  • Increasing numbers of medical texts have been accumulated with a growing amount of biomedical information

  • BASELINES AND OUR MODELS To demonstrate the effectiveness of our proposed model, we compare it against multiple baseline methods and state-ofthe-art approaches for the sentence pair similarity estimation task on other corpora

  • The performance of IA-Siamese neural networks (SNN) outperforms the other four methods, which reveals the importance of interactive semantic information for estimating semantic similarity between sentences

Read more

Summary

Introduction

Increasing numbers of medical texts have been accumulated with a growing amount of biomedical information. Many sentences represent similar semantic meaning, but consist of completely different text descriptions in these considerable data, resulting in considerable unnecessary trouble for medical research. Evaluating the textual similarity between biomedical texts is an important task of extracting useful biomedical information, such as drug-drug interactions (DDIs) [1]. Some researchers have utilized biomedical resources [2] or corpora [3] to improve the performance of evaluation similarity, the generalization of these methods is poor due to limitations in the resources and corpora. Machine learning-based methods [4] are proposed for this task.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call