A scalable and dynamic self-organizing map for clustering large volumes of text data

Sumith Matharage,Damminda Alahakoon,Hiran Ganegedara

doi:10.1109/ijcnn.2013.6706733

Abstract

Self Organizing Map (SOM) and Growing Self Organizing Map (GSOM) are widely used techniques for text mining. Mining large text data sets is significantly processor intensive [1]. Recently Fast Growing Self Organizing Map (FastGSOM) was proposed an improvement to the GSOM for clustering text data more efficiently [2]. For text corpuses with thousands of documents, the time requirement could still be a bottleneck with high turnaround times for the analysis process. We propose a new scalable parallel algorithm for text analysis using FastGSOM which can harness the power of parallel and distributed computing for efficient analysis of large scale text datasets. We demonstrate that the proposed algorithm has similar or better accuracy compared to GSOM and is several orders more efficient when operating in parallel.

Full Text