Abstract

Incremental clustering has been proposed to handle large datasets which can not fit into memory entirely. Single pass fuzzy c-means (SpFCM) and Online fuzzy c-means (OFCM) are two representative incremental fuzzy clustering methods. Both of them extend the scalability of fuzzy c-means (FCM) by processing the dataset chunk by chunk. However, due to the data sparsity and high-dimensionality, SpFCM and OFCM fail to produce reasonable results for document data. In this study, we work on clustering approaches that take care of both the large-scale and high-dimensionality issues. Specifically, we propose two methods for incrementally clustering of document data. The first method is a modification of the existing FCM-based incremental clustering with a step to normalize the centroids in each iteration, while the other method is incremental clustering, i.e., Single-Pass or Online, with weighted fuzzy co-clustering. We use several benchmark document datasets for experimental study. The experimental results show that the proposed approaches achieved significant improvements over existing SpFCM and OFCM in document clustering.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.