Somoclu: An Efficient Parallel Library for Self-Organizing Maps

Peter Wittek,Ik Soo Lim,Shi Chao Gao,Li Zhao

doi:10.18637/jss.v078.i09

Peter Wittek, Ik Soo Lim + Show 2 more

Open Access

https://doi.org/10.18637/jss.v078.i09

Copy DOI

Abstract

Somoclu is a massively parallel tool for training self-organizing maps on large data sets written in C++. It builds on OpenMP for multicore execution, and on MPI for distributing the workload across the nodes in a cluster. It is also able to boost training by using CUDA if graphics processing units are available. A sparse kernel is included, which is useful for high-dimensional but sparse data, such as the vector spaces common in text mining workflows. Python, R and MATLAB interfaces facilitate interactive use. Apart from fast execution, memory use is highly optimized, enabling training large emergent maps even on a single computer.

Highlights

Visual inspection of data is crucial to gain an intuition of the underlying structures
We tested an emergent map of 200 × 200 nodes, with the number of training instances ranging from 1,250 to 10,000
Emergent maps in the package kohonen are not possible, as the map is initialized with a sample from the data instances

Summary

Introduction

Visual inspection of data is crucial to gain an intuition of the underlying structures. Self-organizing maps (SOMs) are a widespread visualization tool that embed high-dimensional data on a two-dimensional surface—typically a section of a plane or a torus—while preserving the local topological layout of the original data [9]. Tools exist that scale to large data sets using cluster resources [18], and combining GPU-accelerated nodes in clusters [27] Popular languages used in data analytics all have SOM modules, including MATLAB [24], Python [6], and R [25] Common to these tools is that they seldom make use of parallel computing capabilities, the batch formulation of SOM training invites such implementations. Distributing the workload across multiple nodes is an extension of the parallel formulation (Section 3.2)

Parallelism

Workload in distributed environment

Command-line interface

As an application programming interface

Visualization

Experimental results

Single-node performance

Multi-node scaling

Visualization on real data

Limitations

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of statistical software	Publication Date: Jan 1, 2017
Citations: 47	License type: cc-by

R Discovery Prime

R Discovery Prime

Somoclu: An Efficient Parallel Library for Self-Organizing Maps

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of statistical software

Lead the way for us

Similar Papers

GPU-accelerated Monte Carlo simulation for photodynamic therapy treatment planning
William Chun Yip Lo ... Lothar D Lilge
-
William Chun Yip Lo, et. al.William Chun Yip Lo ... Lothar D Lilge
02 Jul 2009
02 Jul 2009

GPU-accelerated Monte Carlo simulation for photodynamic therapy treatment planning
William Chun Yip Lo ... Tianyi David Han
-
William Chun Yip Lo, et. al.William Chun Yip Lo ... Tianyi David Han
01 Jan 2009
01 Jan 2009

A high-speed implementation of manifold coordinate representations of hyperspectral imagery: a GPU-based approach to rapid nonlinear modeling
T Russell Topping ... James French
-
T Russell Topping, et. al.T Russell Topping ... James French
23 Apr 2010
23 Apr 2010

Instance-Wise Denoising Autoencoder for High Dimensional Data
Lin Chen ... Wan-Yu Deng
Mathematical Problems in Engineering | VOL. 2016
Lin Chen, et. al.Lin Chen ... Wan-Yu Deng
01 Jan 2015
Mathematical Problems in Engineering | VOL. 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Somoclu: An Efficient Parallel Library for Self-Organizing Maps

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of statistical software