Abstract
With the rapid development and all-round popularization of Internet, more and more data is stored in the form of text in the network platform. The massive data makes the redundancy of text information. It is very important to use text similarity technology to remove duplicate data. Therefore, how to effectively improve the accuracy and precision of text similarity calculation is an urgent problem. In this paper, we propose an improved method to calculate sentence semantic similarity. This method uses word2vec model to get the semantic information of the text, uses k-means algorithm to cluster the above results, then uses word2vec model for retraining, and finally gets the sentence similarity. Experimental results indicate that the performance of our algorithm is better improved compared with the traditional word2vec algorithm
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have