Abstract

World Wide Web, the largest shared information source has a remarkable amount of text documents, which makes the document clustering as one of the ideal areas of research these days. To navigate, summarize and retrieve the information effectively, document clustering can facilitate the automatic document organization. To attain the basic objective of document clustering, i.e. clustered documents should have high intra-similarity rate and low inter-similarity rate to other clusters, several techniques are proposed. The basic categorization of document clustering techniques is done into two: partitional and hierarchical techniques. However, the partitional clustering techniques are extremely popular in document clustering area. The K-means inspired algorithms are the most efficient and fast partitional clustering algorithms, which seeks to divide documents collection into separate groups to look for the optimized value of clustering. Cluster grouping techniques frequently experience scalability, high dimensionality, and inaccurate cluster labels issues. This paper modifies the clustering scheme, using Universal Networking Language (UNL) generative feature vector, Subtractive Clustering approach combined with Boundary Restricted Particle Swarm Optimization (BR-APSO) algorithm for efficient document clustering. The proposed method not only compares but analyses the existing document clustering methodologies and improves entropy and purity rates.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call