Abstract
Nowadays, the fast advance of internet technology has brought two challenges. The first one is explosion of information. The second one is new information appears rapidly. Obviously, clustering is a good solution to help users analyze information automatically, whereas traditional clustering algorithms are only suitable for small-scale and stable text collection. In order to solve this problem, a novel clustering algorithm based on vector compression particularly for large-scale text collection (LDVC) and its incremental version (I-LDVC) are proposed in this paper. LDVC selects related features to compress feature sets. Iterative training idea of self- organizing-mapping (SOM) is also imported in it to optimize selection approach. Besides, when novel texts appear, its incremental version (I-LDVC) can select small samples from original texts to alter neuron model to perform incremental clustering. In order to prevent it from over fitting to new added texts, I-LDVC adjusts the weights of samples along with training process. Experimental results demonstrate that LDVC has better performance and lower time complexity on large-scale text collection, and I-LDVC can cluster unstable text collection very well. DOI: http://dx.doi.org/10.5755/j01.itc.45.2.8666
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.