Abstract

Arabic Documents Clustering is an important task for obtaining good results with the traditional Information Retrieval (TR) systems especially with the rapid growth of the number of online documents present in Arabic language. Document clustering aims to automatically group similar documents in one cluster using different similarity/distance measures. In this paper, we evaluate the impact of the stemming on the Arabic Text Document Clustering with five similarity/distance measures: Euclidean Distance, Cosine Similarity, Jaccard Coefficient, Pearson Correlation Coefficient and Averaged Kullback-Leibler Divergence, for the testing dataset. Our experiments on this latter show that the use of the stemming will not yield good results, but makes the representation of the document smaller and the clustering faster.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.