Abstract

In the era of information explosion, people are eager to obtain contents that meet their own needs and interests from massive amounts of information. Therefore, how to understand the needs of Internet users correctly and effectively is one of the urgent problems to be solved. In this case, semantic text similarity task is useful in many application scenarios. To measure semantic text similarity based on text matching model, several Siamese networks are constructed in this paper. Specifically, we firstly use the Stsbenchmark dataset, regarding the GloVe, BERT and DistilBERT as initial models, and add deep neural networks to train and fine-tune, fully utilizing the advantages of the existing models. Next, we test several similarity calculation methods to quantify the semantic similarity of sentence pairs. Moreover, the Pearson and Spearman correlation coefficients are used as evaluation indicators to compare the sentence embedding effects of different models. Finally, experiment result shows the Siamese network based on BERT model has the optimal effect among all, with the highest accuracy rate up to 84.5%. While among several similarity calculation methods, the Cosine Similarity usually obtain the best accuracy rate. In the future, this model can be appropriately used in semantic text similarity tasks, through matching texts between users’ needs and knowledge base. In this way, we can improve machines' language understanding ability as well as meeting the diverse needs of users.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call