On the feasibility of deep learning applications using raw mass spectrometry data.

Joris Cadow,María Rodríguez Martínez,Tiannan Guo,Matteo Manica,Roland Mathis,Ruedi Aebersold

doi:10.1093/bioinformatics/btab311

Joris Cadow, María Rodríguez Martínez + Show 4 more

Open Access

https://doi.org/10.1093/bioinformatics/btab311

Copy DOI

Abstract

SummaryIn recent years, SWATH-MS has become the proteomic method of choice for data-independent–acquisition, as it enables high proteome coverage, accuracy and reproducibility. However, data analysis is convoluted and requires prior information and expert curation. Furthermore, as quantification is limited to a small set of peptides, potentially important biological information may be discarded. Here we demonstrate that deep learning can be used to learn discriminative features directly from raw MS data, eliminating hence the need of elaborate data processing pipelines. Using transfer learning to overcome sample sparsity, we exploit a collection of publicly available deep learning models already trained for the task of natural image classification. These models are used to produce feature vectors from each mass spectrometry (MS) raw image, which are later used as input for a classifier trained to distinguish tumor from normal prostate biopsies. Although the deep learning models were originally trained for a completely different classification task and no additional fine-tuning is performed on them, we achieve a highly remarkable classification performance of 0.876 AUC. We investigate different types of image preprocessing and encoding. We also investigate whether the inclusion of the secondary MS2 spectra improves the classification performance. Throughout all tested models, we use standard protein expression vectors as gold standards. Even with our naïve implementation, our results suggest that the application of deep learning and transfer learning techniques might pave the way to the broader usage of raw mass spectrometry data in real-time diagnosis.Availability and implementationThe open source code used to generate the results from MS images is available on GitHub: https://ibm.biz/mstransc. The raw MS data underlying this article cannot be shared publicly for the privacy of individuals that participated in the study. Processed data including the MS images, their encodings, classification labels and results can be accessed at the following link: https://ibm.box.com/v/mstc-supplementary.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

Proteins participate in virtually every process in the cell, and are directly responsible for its observed phenotype
We investigate whether the inclusion of the secondary MS2 spectra improves the classification performance
Even with our naıve implementation, our results suggest that the application of deep learning and transfer learning techniques might pave the way to the broader usage of raw mass spectrometry data in real-time diagnosis

Summary

Introduction

Proteins participate in virtually every process in the cell, and are directly responsible for its observed phenotype. Their accurate identification and quantification can enable the precise characterization of phenotypes. Proteins are most commonly analyzed by mass spectrometry (MS). Among the available mass spectrometry approaches, SWATH-MS (Sequential Window Acquisition of all THeoretical fragment ion spectra) has emerged as a technology that combines deep proteome coverage, high reproducibility and quantitative consistency and accuracy (Gillet et al, 2012a). In a SWATH-MS measurement, all ionized peptides falling within a specified mass range are fragmented in a systematic and unbiased fashion using large precursor isolation windows (Ludwig et al, 2018). Spectral profiles are recorded for all ionized peptides and fragment ions thereof

Objectives

Methods

Results

Discussion

Conclusion