PeptideShaker Online: A User-Friendly Web-Based Framework for the Identification of Mass Spectrometry-Based Proteomics Data.

Yehia Mokhtar Farag,Marc Vaudel,Harald Barsnes,Carlos Horro

doi:10.1021/acs.jproteome.1c00678

Yehia Mokhtar Farag, Marc Vaudel + Show 2 more

Open Access

https://doi.org/10.1021/acs.jproteome.1c00678

Copy DOI

Journal: Journal of Proteome Research	Publication Date: Oct 28, 2021
Citations: 9	License type: CC BY 4.0

Affiliation: University of Bergen

Abstract

Mass spectrometry-based proteomics is a high-throughput technology generating ever-larger amounts of data per project. However, storing, processing, and interpreting these data can be a challenge. A key element in simplifying this process is the development of interactive frameworks focusing on visualization that can greatly simplify both the interpretation of data and the generation of new knowledge. Here we present PeptideShaker Online, a user-friendly web-based framework for the identification of mass spectrometry-based proteomics data, from raw file conversion to interactive visualization of the resulting data. Storage and processing of the data are performed via the versatile Galaxy platform (through SearchGUI, PeptideShaker, and moFF), while the interaction with the results happens via a locally installed web server, thus enabling researchers to process and interpret their own data without requiring advanced bioinformatics skills or direct access to compute-intensive infrastructures. The source code, additional documentation, and a fully functional demo is available at https://github.com/barsnes-group/peptide-shaker-online.

Highlights

Mass spectrometry-based proteomics generates large amounts of data,[1] and it is essential that the data can be processed and analyzed in such a way that the researcher generating the data can interpret its biological meaning correctly
Interactive visualization can greatly reduce the complexity of interpretation by providing direct interaction with the data and by dividing it into distinct levels, enabling the biological researcher to focus on interpreting the data and extracting biological knowledge.[5]
XXXX, XXX, XXX−XXX Journal of Proteome Research pubs.acs.org/jpr processing is done via the Galaxy platform using (i) ThermoRawFileParser[10] for converting Thermo raw files into mzML11 or mgf; (ii) SearchGUI for protein identification based on ten proteomics search and de novo engines, namely OMSSA,[12] X! Tandem,[13] MyriMatch,[14] MS Amanda,[15] MSGF+,16 Comet,[17] Tide,[18] MetaMorpheus,[19] DirectTag,[20] and Novor;[21] (iii) PeptideShaker for interpretation of the peptide identification data from SearchGUI; and (iv) moFF for extracting MS1 intensities from the mass spectra.[22]

Summary

Introduction

Mass spectrometry-based proteomics generates large amounts of data,[1] and it is essential that the data can be processed and analyzed in such a way that the researcher generating the data can interpret its biological meaning correctly. In addition to biological knowledge, this often requires direct access to significant computational resources and advanced computational skills. The overall challenge can be split into three main categories: (i) access to computational resources; (ii) availability of user-friendly bioinformatics software; and (iii) having the biological understanding to translate the data into useful knowledge. The first category can be addressed by high-performance computing environments that provide the required resources through powerful servers instead of the more limited personal computers,[2] while at the same time making the stored data more portable and accessible; i.e., there is no need to download or move the data.[3] Adding interactive visualization to such setups can help with the second category of the need for user-friendly bioinformatics software, and can play a key role in the data processing and simplify the interpretation of the results.[4] Interactive visualization can greatly reduce the complexity of interpretation by providing direct interaction with the data and by dividing it into distinct levels, enabling the biological researcher to focus on interpreting the data and extracting biological knowledge.[5]

Methods

Results

Conclusion