Abstract
The textual similarity task, which measures the similarity between two text pieces, has recently received much attention in the natural language processing (NLP) domain. However, due to the vagueness and diversity of language expression, only considering semantic or syntactic features, respectively, may cause the loss of critical textual knowledge. This paper proposes a new type of structure tree for sentence representation, which exploits both syntactic (structural) and semantic information known as the weight vector dependency tree (WVD-tree). WVD-tree comprises structure trees with syntactic information along with word vectors representing semantic information of the sentences. Further, Gaussian attention weight is proposed for better capturing important semantic features of sentences. Meanwhile, we design an enhanced tree kernel to calculate the common parts between two structures for similarity judgment. Finally, WVD-tree is tested on widely used semantic textual similarity tasks. The experimental results prove that WVD-tree can effectively improve the accuracy of sentence similarity judgments.
Highlights
We design a novel attention weight that is more sensitive to “distance” to improve the performance of the attention mechanism. Another line of works that discussed tree kernels is associated with our work because our model encodes syntactic-semantic information represented by tree structures
Based on the inheritance of existing tree structure, tree kernel calculation method, and attention mechanism, this paper develops a novel sentence modeling approach that incorporates semantic and syntactic information
(3) For similarity judgment, the enhanced tree kernel is proposed based on the traditional tree kernel and calculates the tree structures’ common fragments
Summary
The experiments are conducted on semantic textual similarity (STS) datasets (2012–2015). Ese datasets cover various domains, for example, news, web forums, images, glosses, and Twitter. Statistics of STS datasets are sorted by year as shown in Table 2 (datasets with the same name have different data in different years)
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have