Abstract

In view of the fact that traditional vector space model for text similarity calculation which does not take word order into consideration leads to bias, this paper puts forward a longest common subsequence and the traditional vector space model of combining text similarity calculation. This method takes the word order and word frequency information into account, using the texts of the longest common subsequence and substring of their information from all public records and the use of word order and word frequency in the text. The importance of similarity calculation is acknowledged, and the traditional vector space model in the calculation of the weight is used on the word frequency information. Some of the dataset collected through the web crawler are used in the proposed text similarity calculation method for testing, and the results proved the effectivity of the method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call