An evolving computational platform for biological mass spectrometry: workflows, statistics and data mining with MASSyPup64.

Robert Winkler

doi:10.7717/peerj.1401

Abstract

In biological mass spectrometry, crude instrumental data need to be converted into meaningful theoretical models. Several data processing and data evaluation steps are required to come to the final results. These operations are often difficult to reproduce, because of too specific computing platforms. This effect, known as ‘workflow decay’, can be diminished by using a standardized informatic infrastructure. Thus, we compiled an integrated platform, which contains ready-to-use tools and workflows for mass spectrometry data analysis. Apart from general unit operations, such as peak picking and identification of proteins and metabolites, we put a strong emphasis on the statistical validation of results and Data Mining. MASSyPup64 includes e.g., the OpenMS/TOPPAS framework, the Trans-Proteomic-Pipeline programs, the ProteoWizard tools, X!Tandem, Comet and SpiderMass. The statistical computing language R is installed with packages for MS data analyses, such as XCMS/metaXCMS and MetabR. The R package Rattle provides a user-friendly access to multiple Data Mining methods. Further, we added the non-conventional spreadsheet program teapot for editing large data sets and a command line tool for transposing large matrices. Individual programs, console commands and modules can be integrated using the Workflow Management System (WMS) taverna. We explain the useful combination of the tools by practical examples: (1) A workflow for protein identification and validation, with subsequent Association Analysis of peptides, (2) Cluster analysis and Data Mining in targeted Metabolomics, and (3) Raw data processing, Data Mining and identification of metabolites in untargeted Metabolomics. Association Analyses reveal relationships between variables across different sample sets. We present its application for finding co-occurring peptides, which can be used for target proteomics, the discovery of alternative biomarkers and protein–protein interactions. Data Mining derived models displayed a higher robustness and accuracy for classifying sample groups in targeted Metabolomics than cluster analyses. Random Forest models do not only provide predictive models, which can be deployed for new data sets, but also the variable importance. We demonstrate that the later is especially useful for tracking down significant signals and affected pathways in untargeted Metabolomics. Thus, Random Forest modeling supports the unbiased search for relevant biological features in Metabolomics. Our results clearly manifest the importance of Data Mining methods to disclose non-obvious information in biological mass spectrometry . The application of a Workflow Management System and the integration of all required programs and data in a consistent platform makes the presented data analyses strategies reproducible for non-expert users. The simple remastering process and the Open Source licenses of MASSyPup64 (http://www.bioprocess.org/massypup/) enable the continuous improvement of the system.

Highlights

Mass spectrometry provides qualitative and quantitative data about molecules
Based on real datasets from proteomics and targeted and untargeted metabolomics we demonstrate the creation of efficient data processing workflows
Operating system and installed programs Based on the Linux platform Fatdog64, an analysis framework and programming environment for mass spectrometry data was created

Summary

Introduction

Mass spectrometry provides qualitative and quantitative data about molecules. Since complex mixtures can be analyzed with high sensitivity and selectivity, mass spectrometry plays a central role in high-throughput biology (Jemal, 2000; Nilsson et al, 2010). The study of the actual state of proteins and metabolites, which reflect the physiological condition of an organism, still relies mainly on mass spectrometry data. A combination of biochemical and instrumental techniques is used to obtain comprehensive, quantitative information about the expression, modification and degradation of proteins at a certain physiological state (Wilkins et al, 1996; Anderson & Anderson, 1998). Immuno-precipitation and other separation strategies are used as first focusing steps, the identification of proteins usually relies on mass spectrometry methods (Shevchenko et al, 2006)

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PeerJ	Publication Date: Nov 17, 2015
Citations: 60	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

An evolving computational platform for biological mass spectrometry: workflows, statistics and data mining with MASSyPup64.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ

Lead the way for us

Similar Papers

Drug Metabolite Profiling and Identification by High-resolution Mass Spectrometry
Mingshe Zhu ... W Griffith Humphreys
The Journal of biological chemistry | VOL. 286
Mingshe Zhu, et. al.Mingshe Zhu ... W Griffith Humphreys
01 Jul 2011
The Journal of biological chemistry | VOL. 286

Metabolomics meets systems immunology.
Jianbo Fu ... Feng Zhu
EMBO Reports | VOL. 24
Jianbo Fu, et. al.Jianbo Fu ... Feng Zhu
14 Mar 2023
EMBO Reports | VOL. 24

A Workflow Management System for Scalable Data Mining on Clouds
Fabrizio Marozzo ... Domenico Talia
IEEE transactions on services computing | VOL. 11
Fabrizio Marozzo, et. al.Fabrizio Marozzo ... Domenico Talia
01 May 2018
IEEE transactions on services computing | VOL. 11

Enhanced MS/MS coverage for metabolite identification in LC-MS-based untargeted metabolomics by target-directed data dependent acquisition with time-staggered precursor ion list
Yang Wang ... Jian-Bo Wan
Analytica Chimica Acta | VOL. 992
Yang Wang, et. al.Yang Wang ... Jian-Bo Wan
13 Sep 2017
Analytica Chimica Acta | VOL. 992

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An evolving computational platform for biological mass spectrometry: workflows, statistics and data mining with MASSyPup64.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ