Abstract
Text similarity calculation is the basic work in the application of Chinese information processing. A high-quality text similarity calculation method must be accurate and efficient, that is, it can be able to compare texts from the level of text natural language meaning, and arrive at the similarity distinction similar to artificial reading based on a full understanding of the author or text source semantic. At the same time, it should also be an efficient algorithm to save the processing time in facing large amount of text information to be processed. Through the research of many domestic and foreign literature, analysis and further research on current situation of similarity calculation, this paper intended to present a new method to improve the performance of similarity calculation, namely a Chinese text similarity algorithm based on word-number difference, which combined the traditional based on statistics and the narrow semantic method that meant the combination of the statistical efficiency and semantic accuracy. Combining the advantages of statistics and semantic category also means the necessity to face and overcome disadvantages of the two kinds of methods. This paper attempted to take the difference in word-number as the breakthrough point, took advantage of the diversity of Chinese word-number, combining with the word frequency, number and meaning, in order to successfully extend the word similarity calculation to the text similarity calculation. Finally, introduced the self built small text set as test object, compared similarity calculation of different methods in the laboratory environment. It shows that the similarity calculation method based on difference in word-number performances better than the traditional methods based on statistical and semantic. Through artificial comparison of the test results of research on this topic in accuracy and speed of segmentation, provide a new approach for Chinese text similarity calculation
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have