Abstract

Conventional approaches to text analysis and information retrieval which measured document similarity by using considering all of the information in texts are a relatively inefficiency for processing large text collections in heterogeneous subject areas. This paper outlined a new text manipulation system FA-Sim that is useful for retrieving information in large heterogeneous texts and for recognizing content similarity in text excerpts. FA-Sim is based on flexible text matching procedures carried out in various contexts and various field ranks. FA-Sim measures texts similarity by using specific field association (FA) terms instead of by comparing all text information. Similarity between texts is faster and higher by using FA-Sim than other two analysis methods. Therefore, Recall and Precision significantly improved by 39% and 37% over these two traditional methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call