Abstract

Growing Self Organizing Map (GSOM) has proven benefits in text clustering. Latent Semantic Analysis (LSA) also has been used in text clustering to capture the latent concepts from text. This paper presents a novel combination of GSOM and LSA to improve text clustering results compared to using GSOM on its own. LSA is an inherently global algorithm that looks at trends and patterns globally and GSOM is a nearest neighborhood based algorithm which looks at local patterns. Combination of these two can be used to discover both the global and local patterns. In the proposed model, initial text corpus is converted into its vector space representation using the traditional Term Frequency - Inverse Document Frequency (TF-IDF) technique. Then the Singular Value Decomposition (SVD) followed by Frobenius norm is applied on the resulting high dimensional vector to come up with a new vector with an optimal number of dimensions. Experiments using the proposed model were conducted and compared with the original GSOM under the same conditions. Experiment results demonstrate that the new combination of these well known techniques enhances the accuracy of clustering results and the computational time than the GSOM alone.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call