Text Similarity Measurement Method Based on BiLSTM-SECapsNet Model

Shanping Zhang,Qiuchen Wang,Xiaodong Wang,Ye Tao,Xiaowei Xu,Fangfang Chen

doi:10.1109/icivc52351.2021.9527010

Abstract

Text similarity measurement is a basic task in natural language processing and widely used in information retrieval, automatic question answering, machine translation, etc. Because most traditional statistical-based methods for text similarity measurement cannot efficiently extract the semantic information of the text, we propose a BiLSTM-SECapsNet hybrid model based on BiLSTM, CapsNet, and SENet for text similarity measurement. We employed the siamese BiLSTM network as sequential inference models to extract the global features. The coattention mechanism is employed to generate attention weight between the text features, which will collect local inference over sequences. The CapsNet network is also introduced to catch the local features of the text. And then the SENet network is used to automatically calibrate the importance of each local feature to obtain the local feature matrix. After that, the feature matrix is fused again and the BiLSTM network is used to extract the context information to obtain the similarity matrix of the two texts. At last, the semantic similarity of the text is measured through the fusion, pooling, and fully connected layer. The experimental results based on the Quora Questions Pairs data set show that the accuracy of the method is 87.31, and the F-measure is 87.35. Compared with other networks, the effectiveness of the method has been improved.

Full Text