An unsupervised semantic text similarity measurement model in resource-limited scenes

Qi Xiao,Yunchuan Qin,Kenli Li,Zhuo Tang,Fan Wu,Zhizhong Liu

doi:10.1016/j.ins.2022.10.127

Abstract

As the basis of many artificial intelligence tasks, text similarity measurement has received extensive attention in current studies. However, few of them focus on the resource-limited scenes (i.e., limited computational resources and few training datasets), which are becoming increasingly popular and challenging with the development of the Internet of Things. Worse still, popular methods such as the deep-neural-network-based methods may lose their power in such scenes, since they typically require considerable computational resources. As for most current traditional methods, they also have issues of not effectively exploiting the semantic information in the sentences. As an alternative, this paper proposes a lightweight and semantically rich text similarity measurement model named the TES-TK model. In this model, a sentence is first transformed into a tree structure called TES-Tree with the integration of syntactic information, semantic knowledge, and topic distribution, aiming to comprehensively represent the multidimensional semantics of sentences. Afterward, a modified tree kernel model is designed to calculate the similarity between each pair of TES-Trees. In this way, the similarity score between the two related sentences can be retrieved. Experiments on 19 public benchmark datasets (STS2012–2015) demonstrate that the proposed approach exhibits significantly better performance than the compared eight peer methods on most datasets. Especially in resource-limited scenes, our approach achieved highly competitive results compared with the latest methods, such as BERT.

Full Text