Abstract

Text clustering is the process of segmenting a particular collection of texts into subgroups including content-based similar ones. This study proposes a new neural network, called NTSO (neural text self organizer), which is suitable for text clustering. This neural network uses string vectors instead of numerical vectors as its input vectors and its weight vectors are different from those of other unsupervised neural networks such as Kohonen networks and ART (adaptive resonance theory), although it is similar to Kohonen networks at the architecture level and in its learning process. Intuitively, text is better represented by a string vector than by a numerical vector. The representation of texts into numerical vectors leads to two main problems: sparse distribution and huge dimensionality of the feature vectors. This study proposes an unsupervised neural network that uses string vectors for text clustering, to address these problems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.