A Novel Method for Text Similarity Calculation

Cai Rui,Li Fei,Quan Cong,Chen Bin

doi:10.4028/www.scientific.net/amr.660.202

Abstract

In view of the fact that traditional vector space model for text similarity calculation which does not take word order into consideration leads to bias, this paper puts forward a longest common subsequence and the traditional vector space model of combining text similarity calculation. This method takes the word order and word frequency information into account, using the texts of the longest common subsequence and substring of their information from all public records and the use of word order and word frequency in the text. The importance of similarity calculation is acknowledged, and the traditional vector space model in the calculation of the weight is used on the word frequency information. Some of the dataset collected through the web crawler are used in the proposed text similarity calculation method for testing, and the results proved the effectivity of the method.

Full Text