Abstract
As a fundamental problem of natural language processing (NLP), the calculation of semantic text similarity plays a crucial role in a variety of big data application situations. In the process of text similarity modeling, however, owing to the complexity and ambiguity of Chinese semantics, effectively capturing the semantic interaction characteristics of Chinese text only from a single angle is impossible. This study proposes a deep learning-based computational model for semantic text similarity called SRU-based multi-angle enhanced network (SMAEN). Specifically, the authors firstly combine character-grained embeddings and word-granularity embeddings obtained from the pre-trained model to represent text. The text is encoded using a bidirectional simple recurrent unit (Bi-SRU) network, and the local text similarity is represented using a soft-aligned attention technique. In addition, the authors integrate Bi-SRU with an improved convolutional neural network (CNN) for global similarity modeling to capture semantic, time, and spatial characteristics of short text interaction. Finally, they employ a pooling layer to aggregate the calculation results into a fixed-length vector and a multi-layer perceptual (MLP) classifier to make a determination. Experimental results on Chinese public datasets LCQMC and PAWS-X show that the proposed method fully captures semantic interaction features from multiple angles and achieves advanced performance. This method can produce better matching results and enhance the accuracy of large data analysis. It is applicable to numerous scenarios involving large data, such as information retrieval and recommendation systems.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have