A vector reconstruction based clustering algorithm particularly for large-scale text collection

Ming Liu,Chong Wu,Lei Chen

doi:10.1016/j.neunet.2014.10.012

Abstract

Along with the fast evolvement of internet technology, internet users have to face the large amount of textual data every day. Apparently, organizing texts into categories can help users dig the useful information from large-scale text collection. Clustering is one of the most promising tools for categorizing texts due to its unsupervised characteristic. Unfortunately, most of traditional clustering algorithms lose their high qualities on large-scale text collection, which mainly attributes to the high-dimensional vector space and semantic similarity among texts. To effectively and efficiently cluster large-scale text collection, this paper puts forward a vector reconstruction based clustering algorithm. Only the features that can represent the cluster are preserved in cluster’s representative vector. This algorithm alternately repeats two sub-processes until it converges. One process is partial tuning sub-process, where feature’s weight is fine-tuned by iterative process similar to self-organizing-mapping (SOM) algorithm. To accelerate clustering velocity, an intersection based similarity measurement and its corresponding neuron adjustment function are proposed and implemented in this sub-process. The other process is overall tuning sub-process, where the features are reallocated among different clusters. In this sub-process, the features useless to represent the cluster are removed from cluster’s representative vector. Experimental results on the three text collections (including two small-scale and one large-scale text collections) demonstrate that our algorithm obtains high-quality performances on both small-scale and large-scale text collections.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A vector reconstruction based clustering algorithm particularly for large-scale text collection

Abstract

Talk to us

Similar Papers

More From: Neural Networks

Lead the way for us

Journal: Neural Networks	Publication Date: Dec 9, 2014
Citations: 4

Similar Papers

A novel clustering algorithm for large-scale text collection and its incremental version
Lei Chen
Information Technology And Control | VOL. 45
Lei ChenLei Chen
27 Jun 2016
Information Technology And Control | VOL. 45

Artificial Intelligence Tools for Business Applications: Objective Map of Science and Analysis of Texts
Mikhail G Kreines ... Elena M Kreines
-
Mikhail G Kreines, et. al.Mikhail G Kreines ... Elena M Kreines
01 Jul 2019
01 Jul 2019

Distributional Models for Lexical Semantics: An Investigation of Different Representations for Natural Language Learning
Danilo Croce ... Simone Filice
-
Danilo Croce, et. al.Danilo Croce ... Simone Filice
01 Jan 2015
01 Jan 2015

Distributional Models and Lexical Semantics in Convolution Kernels
Danilo Croce ... Roberto Basili
-
Danilo Croce, et. al.Danilo Croce ... Roberto Basili
01 Jan 2012
01 Jan 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A vector reconstruction based clustering algorithm particularly for large-scale text collection

Abstract

Talk to us

Similar Papers

More From: Neural Networks