As the amount of data is growing day by day, there is a need to convert it into some effective manner so as to extract some useful information from huge data. Text mining is used to perform this task. In this paper, text clustering is used to convert the large data into different cluster forms to extract the meaningful information for the purpose of analysis so as to get the summarised data. Three partitioning-based clustering techniques, i.e., k-means, k-means fast and k-medoids are compared, and a new algorithm named shift k-medoid is proposed, which is hybrid of k-medoid and mean shift clustering algorithms. Cosine similarity, correlation coefficient and Jaccard similarity measures are used to check the performance of the algorithms and two measures, i.e., randomised feature and normalised mutual information (NMI) feature are used to test the accuracy of the algorithms. The outcomes demonstrate that the best performance is accomplished by using the proposed algorithm.
Read full abstract