CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis

Olga Permiakova,Thomas Burger,Romain Guibert,Alexandra Kraut,Anne-Marie Hesse,Thomas Fortin

doi:10.1186/s12859-021-03969-0

Olga Permiakova, Thomas Burger + Show 4 more

Open Access

https://doi.org/10.1186/s12859-021-03969-0

Copy DOI

Abstract

BackgroundThe clustering of data produced by liquid chromatography coupled to mass spectrometry analyses (LC-MS data) has recently gained interest to extract meaningful chemical or biological patterns. However, recent instrumental pipelines deliver data which size, dimensionality and expected number of clusters are too large to be processed by classical machine learning algorithms, so that most of the state-of-the-art relies on single pass linkage-based algorithms.ResultsWe propose a clustering algorithm that solves the powerful but computationally demanding kernel k-means objective function in a scalable way. As a result, it can process LC-MS data in an acceptable time on a multicore machine. To do so, we combine three essential features: a compressive data representation, Nyström approximation and a hierarchical strategy. In addition, we propose new kernels based on optimal transport, which interprets as intuitive similarity measures between chromatographic elution profiles.ConclusionsOur method, referred to as CHICKN, is evaluated on proteomics data produced in our lab, as well as on benchmark data coming from the literature. From a computational viewpoint, it is particularly efficient on raw LC-MS data. From a data analysis viewpoint, it provides clusters which differ from those resulting from state-of-the-art methods, while achieving similar performances. This highlights the complementarity of differently principle algorithms to extract the best from complex LC-MS data.

Highlights

The clustering of data produced by liquid chromatography coupled to mass spectrometry analyses (LC-MS data) has recently gained interest to extract mean‐ ingful chemical or biological patterns
Liquid chromatography coupled to mass spectrometry (LC-MS) constitute a technological pipeline that has become ubiquitous in various omics investigations, such as proteomics, lipidomics and metabolomics
We focused on k-means objective function, for two reasons: First, until recently, it was considered by the proteomics community as non-applicable to data as big as LC-MS data [7], while recent theoretical progresses have made this scaling-up possible [44]; Second, k-means can be reformulated to fit the reproducing kernel Hilbert space theory [45], which provides new opportunities to define similarity measures that capture the biochemical specificities of LC-MS data

Summary

Introduction

The clustering of data produced by liquid chromatography coupled to mass spectrometry analyses (LC-MS data) has recently gained interest to extract mean‐ ingful chemical or biological patterns. The MS throughput has continuously improved, leading to unprecedented data volume production To date, processing these gigabytes of low level MS signals has become a challenge on its own, for a trade-off between contradictory objectives is sought: On the one hand, one needs to save memory and computational time with efficient encoding, compression and signal cleaning methods [1]. One needs to avoid too important preprocessing that systematically smoothes signals of lower magnitudes, as it is well-established that interesting biological patterns can be found near the noise level [2] To face this challenge, a recent and efficient investigation path has been to apply cluster analysis to LC-MS data. As each cluster contains similar data elements, it facilitates the extraction of repetitive but small biological patterns

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Feb 12, 2021
Citations: 2	License type: open-access

R Discovery Prime

R Discovery Prime

CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

BatMass: a Java Software Platform for LC-MS Data Visualization in Proteomics and Metabolomics.
Dmitry M Avtonomov ... Alexander Raskind
Journal of Proteome Research | VOL. 15
Dmitry M Avtonomov, et. al.Dmitry M Avtonomov ... Alexander Raskind
28 Jun 2016
Journal of Proteome Research | VOL. 15

MassUntangler: A novel alignment tool for label-free liquid chromatography–mass spectrometry proteomic data
R Ballardini ... M Benevento
Journal of Chromatography A | VOL. 1218
R Ballardini, et. al.R Ballardini ... M Benevento
22 Jun 2011
Journal of Chromatography A | VOL. 1218

Plant metabolomics: Resolution and quantification of elusive peaks in liquid chromatography–mass spectrometry profiles of complex plant extracts using multi-way decomposition methods
Bekzod Khakimov ... Søren Balling Engelsen
Journal of Chromatography A | VOL. 1266
Bekzod Khakimov, et. al.Bekzod Khakimov ... Søren Balling Engelsen
15 Oct 2012
Journal of Chromatography A | VOL. 1266

Analysis and Quantification of Diagnostic Serum Markers and Protein Signatures for Gaucher Disease
Johannes P.C Vissers ... Johannes M.F.G Aerts
Molecular & Cellular Proteomics | VOL. 6
Johannes P.C Vissers, et. al.Johannes P.C Vissers ... Johannes M.F.G Aerts
01 May 2007
Molecular & Cellular Proteomics | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics