Abstract

Self Organizing Map (SOM) and Growing Self Organizing Map (GSOM) are widely used techniques for text mining. Mining large text data sets is significantly processor intensive [1]. Recently Fast Growing Self Organizing Map (FastGSOM) was proposed an improvement to the GSOM for clustering text data more efficiently [2]. For text corpuses with thousands of documents, the time requirement could still be a bottleneck with high turnaround times for the analysis process. We propose a new scalable parallel algorithm for text analysis using FastGSOM which can harness the power of parallel and distributed computing for efficient analysis of large scale text datasets. We demonstrate that the proposed algorithm has similar or better accuracy compared to GSOM and is several orders more efficient when operating in parallel.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call