GigaSOM.jl: High-performance clustering and visualization of huge cytometry datasets.

Miroslav Kratochvíl,Venkata P Satagopam,Jiří Vondrášek,Laurent Heirendt,Reinhard Schneider,Christophe Trefois,Oliver Hunewald,Vasco Verissimo,Markus Ollert

doi:10.1093/gigascience/giaa127

Abstract

BackgroundThe amount of data generated in large clinical and phenotyping studies that use single-cell cytometry is constantly growing. Recent technological advances allow the easy generation of data with hundreds of millions of single-cell data points with >40 parameters, originating from thousands of individual samples. The analysis of that amount of high-dimensional data becomes demanding in both hardware and software of high-performance computational resources. Current software tools often do not scale to the datasets of such size; users are thus forced to downsample the data to bearable sizes, in turn losing accuracy and ability to detect many underlying complex phenomena.ResultsWe present GigaSOM.jl, a fast and scalable implementation of clustering and dimensionality reduction for flow and mass cytometry data. The implementation of GigaSOM.jl in the high-level and high-performance programming language Julia makes it accessible to the scientific community and allows for efficient handling and processing of datasets with billions of data points using distributed computing infrastructures. We describe the design of GigaSOM.jl, measure its performance and horizontal scaling capability, and showcase the functionality on a large dataset from a recent study.ConclusionsGigaSOM.jl facilitates the use of commonly available high-performance computing resources to process the largest available datasets within minutes, while producing results of the same quality as the current state-of-art software. Measurements indicate that the performance scales to much larger datasets. The example use on the data from a massive mouse phenotyping effort confirms the applicability of GigaSOM.jl to huge-scale studies.

Highlights

The amount of data generated in large clinical and phenotyping studies that use single-cell cytometry is constantly growing
We show that construction of a self-organizing map (SOM) from 109 cells with 40 parameters can be performed in minutes, even on relatively small compute clusters with less than hundreds of central processing unit (CPU) cores
We presented the functionality of GigaSOM.jl, a new, highly scalable toolkit for analyzing cytometry data with algorithms derived from SOMs

Summary

Introduction

The amount of data generated in large clinical and phenotyping studies that use single-cell cytometry is constantly growing. Results: We present GigaSOM.jl, a fast and scalable implementation of clustering and dimensionality reduction for flow and mass cytometry data. We describe the design of GigaSOM.jl, measure its performance and horizontal scaling capability, and showcase the functionality on a large dataset from a recent study. Conclusions: GigaSOM.jl facilitates the use of commonly available high-performance computing resources to process the largest available datasets within minutes, while producing results of the same quality as the current state-of-art software. The samples collected in recent studies often contain millions of measured cells (events), resulting in large and high-dimensional datasets. Computational performance of the algorithms, necessary for scaling to larger datasets, is often neglected, and the available analysis software often relies on various simplifications (such as downsampling, which impairs the quality and precision of the result) required to process large datasets in reasonable time without disproportionate hardware requirements

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: GigaScience	Publication Date: Nov 18, 2020
Citations: 11	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

GigaSOM.jl: High-performance clustering and visualization of huge cytometry datasets.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: GigaScience

Lead the way for us

Similar Papers

CytoTree: an R/Bioconductor package for analysis and visualization of flow and mass cytometry data
Yuting Dai ... Jinyan Huang
BMC bioinformatics | VOL. 22
Yuting Dai, et. al.Yuting Dai ... Jinyan Huang
22 Mar 2021
BMC bioinformatics | VOL. 22

Algorithmic Clustering Of Single-Cell Cytometry Data-How Unsupervised Are These Analyses Really?
Christina Bligaard Pedersen ... Lars Rønn Olsen
Cytometry Part A | VOL. 97
Christina Bligaard Pedersen, et. al.Christina Bligaard Pedersen ... Lars Rønn Olsen
05 Nov 2019
Cytometry Part A | VOL. 97

Neuroscience Gateway � Cyberinfrastructure Providing Supercomputing Resources for Large Scale Computational Neuroscience Research
Majumdar Amitava ... Quintana Adrian
Frontiers in Neuroinformatics | VOL. 10
Majumdar Amitava, et. al.Majumdar Amitava ... Quintana Adrian
01 Jan 2015
Frontiers in Neuroinformatics | VOL. 10

CytoFA: Automated Gating of Mass Cytometry Data via Robust Skew Factor Analzyers
Sharon X Lee
-
Sharon X LeeSharon X Lee
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

GigaSOM.jl: High-performance clustering and visualization of huge cytometry datasets.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: GigaScience