Abstract

Computing the degree of closeness (similarity) between two sets of text documents is one of the core operations in many text mining applications like text classification, clustering and sentiment analysis. The efficiency of such applications mainly depends on the factors like selection of representation model, selection of the similarity metric and selection of learning algorithms. Among these three factors, selection of similarity measure is important since it contributes to the efficiency of most of the text mining applications. In this research article, an efficient similarity measure is proposed for computing the closeness between two sets of text documents. The proposed measure has the capacity of considering different real time situations like presence of a feature or absence of features for computing the degree of similarity between the documents. Furthermore, a compression modeling similarity measure is also proposed for text documents. Two different sets of experiments are conducted to validate the efficacy of the proposed similarity measures. Experimental results demonstrate that thef-measure score obtained from proposed similarity metric is better than thef-measure score of the existing state of the art techniques.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.